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ETAPS Foreword 


Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech 
Republic in its beautiful capital Prague. 

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. 
Each conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from theo- 
retical computer science to foundations to programming language developments, 
analysis tools, formal approaches to software engineering, and security. 

Organizing these conferences in a coherent, highly synchronized conference pro- 
gram enables participation in an exciting event, offering the possibility to meet many 
researchers working in different directions in the field and to easily attend talks of 
different conferences. ETAPS 2019 featured a new program item: the Mentoring 
Workshop. This workshop is intended to help students early in the program with advice 
on research, career, and life in the fields of computing that are covered by the ETAPS 
conference. On the weekend before the main conference, numerous satellite workshops 
took place and attracted many researchers from all over the globe. 

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, 
yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of 
Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited 
speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac 
Flanagan (University of California at Santa Cruz). Invited tutorials were provided by 
Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare 
Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 
2019 attendants, I thank all the speakers for their inspiring and interesting talks! 

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles 
University. Charles University was founded in 1348 and was the first university in 
Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further 
supported by the following associations and societies: ETAPS e.V., EATCS (European 
Association for Theoretical Computer Science), EAPLS (European Association for 
Programming Languages and Systems), and EASST (European Association of Soft- 
ware Science and Technology). The local organization team consisted of Jan Vitek and 
Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech 
Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek. 
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The ETAPS SC consists of an Executive Board, and representatives of the 
individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and 
EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns 
(Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Liittgen 
(Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and 
Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst 
(Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), 
Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), 
Jurriaan Hage (Utrecht), Rainer Hahnle (Darmstadt), Reiko Heckel (Leicester), 
Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen 
(Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Miiller 
(Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), 
Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), 
Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), 
Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim 
(Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing). 

I would like to take this opportunity to thank all speakers, attendants, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoy the 
proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local 
organization team for all their enormous efforts enabling a fantastic ETAPS in Prague! 


February 2019 Joost-Pieter Katoen 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


TACAS 2019 was the 25th edition of the International Conference on Tools and 
Algorithms for the Construction and Analysis of Systems conference series. 
TACAS 2019 was part of the 22nd European Joint Conferences on Theory and Practice 
of Software (ETAPS 2019). The conference was held at the Orea Hotel Pyramida in 
Prague, Czech Republic, during April 8—11, 2019. 

Conference Description. TACAS is a forum for researchers, developers, and users 
interested in rigorously based tools and algorithms for the construction and analysis of 
systems. The conference aims to bridge the gaps between different communities with 
this common interest and to support them in their quest to improve the utility, relia- 
bility, flexibility, and efficiency of tools and algorithms for building systems. TACAS 
2019 solicited four types of submissions: 


— Research papers, identifying and justifying a principled advance to the theoretical 
foundations for the construction and analysis of systems, where applicable sup- 
ported by experimental validation. 

— Case-study papers, reporting on case studies and providing information about the 
system being studied, the goals of the study, the challenges the system poses to 
automated analysis, research methodologies and approaches used, the degree to 
which goals were attained, and how the results can be generalized to other problems 
and domains. 

— Regular tool papers, presenting a new tool, a new tool component, or novel 
extensions to an existing tool, with an emphasis on design and implementation 
concerns, including software architecture and core data structures, practical 
applicability, and experimental evaluations. 

— Tool-demonstration papers (short), focusing on the usage aspects of tools. 


Paper Selection. This year, 164 papers were submitted to TACAS, among which 
119 were research papers, 10 case-study papers, 24 regular tool papers, and 11 were 
tool-demonstration papers. After a rigorous review process, with each paper reviewed 
by at least three Program Committee members, followed by an online discussion, the 
Program Committee accepted 29 research papers, 2 case-study papers, 11 regular tool 
papers, and 8 tool-demonstration papers (50 papers in total). 

Artifact-Evaluation Process. The main novelty of TACAS 2019 was that, for the 
first time, artifact evaluation was compulsory for all regular tool papers and tool 
demonstration papers. For research papers and case-study papers, artifact evaluation 
was optional. The artifact evaluation process was organized as follows: 


— Regular tool papers and tool demonstration papers. The authors of the 35 
submitted papers of these categories of papers were required to submit an artifact 
alongside their paper submission. Each artifact was evaluated independently by 
three reviewers. Out of the 35 artifact submissions, 28 were successfully evaluated, 
which corresponds to an acceptance rate of 80%. The AEC used a two-phase 
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reviewing process: Reviewers first performed an initial check to see whether the 
artifact was technically usable and whether the accompanying instructions were 
consistent, followed by a full evaluation of the artifact. The main criterion for 
artifact acceptance was consistency with the paper, with completeness and docu- 
mentation being handled in a more lenient manner as long as the artifact was useful 
overall. The reviewers were instructed to check whether results are consistent with 
what is described in the paper. Inconsistencies were to be clearly pointed out and 
explained by the authors. In addition to the textual reviews, reviewers also proposed 
a numeric value about (potentially weak) acceptance/rejection of the artifact. After 
the evaluation process, the results of the artifact evaluation were summarized and 
forwarded to the discussion of the papers, so as to enable the reviewers of the papers 
to take the evaluation into account. In all but three cases, tool papers whose artifacts 
did not pass the evaluation were rejected. 

— Research papers and case-study papers. For this category of papers, artifact 
evaluation was voluntary. The authors of each of the 25 accepted papers were 
invited to submit an artifact immediately after the acceptance notification. Owing to 
the short time available for the process and acceptance of the artifact not being 
critical for paper acceptance, there was only one round of evaluation for this 
category, and every artifact was assigned to two reviewers. The artifacts were 
evaluated using the same criteria as for tool papers. Out of the 18 submitted artifacts 
of this phase, 15 were successfully evaluated (83% acceptance rate) and were 
awarded the TACAS 2019 AEC badge, which is added to the title page of the 
respective paper if desired by the authors. 


TOOLympics. TOOLympics 2019 was part of the celebration of the 25th anniver- 
sary of the TACAS conference. The goal of TOOLympics is to acknowledge the 
achievements of the various competitions in the field of formal methods, and to 
understand their commonalities and differences. A total of 24 competitions joined 
TOOLympics and were presented at the event. An overview and competition reports of 
11 competitions are included in the third volume of the TACAS 2019 proceedings, 
which are dedicated to the 25th anniversary of TACAS. The extra volume contains a 
review of the history of TACAS, the TOOLympics papers, and the papers of the annual 
Competition on Software Verification. 

Competition on Software Verification. TACAS 2019 also hosted the 8th Interna- 
tional Competition on Software Verification (SV-COMP), chaired and organized by 
Dirk Beyer. The competition again had high participation: 31 verification systems with 
developers from 14 countries were submitted for the systematic comparative evalua- 
tion, including three submissions from industry. The TACAS proceedings includes the 
competition report and short papers describing 11 of the participating verification 
systems. These papers were reviewed by a separate program committee (PC); each 
of the papers was assessed by four reviewers. Two sessions in the TACAS program 
(this year as part of the TOOLympics event) were reserved for the presentation of the 
results: the summary by the SV-COMP chair and the participating tools by the 
developer teams in the first session, and the open jury meeting in the second session. 

Acknowledgments. We would like to thank everyone who helped to make TACAS 
2019 successful. In particular, we would like to thank the authors for submitting their 
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papers to TACAS 2019. We would also like to thank all PC members, additional 
reviewers, as well as all members of the artifact evaluation committee (AEC) for their 
detailed and informed reviews and, in the case of the PC and AEC members, also for 
their discussions during the virtual PC and AEC meetings. We also thank the Steering 
Committee for their advice. Special thanks go to the Organizing Committee of ETAPS 
2019 and its general chairs, Jan Kofron and Jan Vitek, to the chair of the ETAPS 2019 
executive board, Joost-Pieter Katoen, and to the publication team at Springer. 
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Abstract. We propose an automated method for computing inductive invari- 
ants used to proving deadlock freedom of parametric component-based systems. 
The method generalizes the approach for computing structural trap invariants 
from bounded to parametric systems with general architectures. It symbolically 
extracts trap invariants from interaction formulae defining the system architec- 
ture. The paper presents the theoretical foundations of the method, including new 
results for the first order monadic logic and proves its soundness. It also reports 
on a preliminary experimental evaluation on several textbook examples. 


Modern computing systems exhibit dynamic and reconfigurable behavior. To tackle the 
complexity of such systems, engineers extensively use architectures that enforce, by 
construction, essential properties, such as fault tolerance or mutual exclusion. Architec- 
tures can be viewed as parametric operators that take as arguments instances of com- 
ponents of given types and enforce a characteristic property. For instance, client-server 
architectures enforce atomicity and resilience of transactions, for any numbers of clients 
and servers. Similarly, token-ring architectures enforce mutual exclusion between any 
number of components in the ring. 

Parametric verification is an extremely relevant and challenging problem in sys- 
tems engineering. In contrast to the verification of bounded systems, consisting of a 
known set of components, there exist no general methods and tools succesfully applied 
to parametric systems. Verification problems for very simple parametric systems, even 
with finite-state components, are typically intractable [10,16]. Most work in this area 
puts emphasis on limitations determined mainly by three criteria (1) the topology of the 
architecture, (2) the coordination primitives, and (3) the properties to be verified. 

The main decidability results reduce parametric verification to the verification of a 
bounded number of instances of finite state components. Several methods try to deter- 
mine a cut-off size of the system, i.e. the minimal size for which if a property holds, then 
it holds for any size, e.g. Suzuki [20], Emerson and Namjoshi [15]. Other methods iden- 
tify systems with well-structured transition relations, for which symbolic enumeration 


The research leading to these results has received funding from the European Union Horizon 
2020 research and innovation programme under grant agreement no. 700665 CITADEL (Critical 
Infrastructure Protection using Adaptive MILS) and no. 730086 ERGO (European Robotic Goal- 
Oriented Autonomous Controller). 

© The Author(s) 2019 


T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 3-20, 2019. 
https://doi.org/10.1007/978-3-030-17465-1_1 


4 M. Bozga et al. 


of reachable states is feasible [1] or reduce to known decidable problems, such as reach- 
ability in vector addition systems [16]. Typically, these methods apply to systems with 
global coordination. When theoretical decidability is not of concern, semi-algorithmic 
techniques such as regular model checking [2,17], SMT-based bounded model check- 
ing [3, 14], abstraction [8,11] and automata learning [13] can be used to deal with more 
general classes of The interested reader can find a complete survey on parameterized 
model checking by Bloem et al. [10]. 

This paper takes a different angle of attack to the verification problem, seeking gen- 
erality of the type of parametric systems and focusing on the verification of a particular 
but essential property: deadlock-freedom. The aim is to come up with effective methods 
for checking deadlock-freedom, by overcoming the complexity blowup stemming from 
the effective generation of reachability sets. We briefly describe our approach below. 

A system is the composition of a finite number of component instances of 
given types, using interactions that follow the Behaviour-Interaction-Priorities (BIP) 
paradigm [7]. To simplify the technical part, we assume that components and interac- 
tions are finite abstractions of real-life systems. An instance is a finite-state transition 
system whose edges are labeled by ports. The instances communicate synchronously 
via a number of simultaneous interactions involving a set of ports each, such that no 
data is exchanged during interactions. If the number of instances in the system is fixed 
and known in advance, we say that the system is bounded, otherwise it is parametric. 


Pees semen eens a lm ee eee ee ee ee a 


u as E. $ 7 [pr aN 


u u \by \ \ b Y u w vba) fA HA i 
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e o f e ED O 


T=adbh, VanhVerfi Ven fr T=addi.b(i)Veddi.f(i) 


(a) Bounded System (b) Parametric System 


Fig. 1. Mutual exclusion example 


For instance, the bounded system in Fig. la consist of component types Semaphore, 
with one instance, and Task, with two instances. A semaphore goes from the free state 
r to the taken state s by an acquire action a, and viceversa from s to r by a release 
action e. A task goes from waiting w to busy u by action b and viceversa, by action 
f. For the bounded system in Fig. la, the interactions are {a, bı}, {a, b2}, {e, fi} and 
{e, f2}, depicted with dashed lines. Since the number of instances is known in advance, 
we can view an interaction as a minimal satisfying valuation of the boolean formula 
T = (aAb,)V (aAb2)V (eA fi) V (eA f2), where the port symbols are propositional vari- 
ables. Because every instance has finitely many states, we can write a boolean formula 
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A = [>r V AWW, V w2)] A [>s V ~—(u; V u2)], this time over propositional state variables, 
which defines the configurations in which all interactions are disabled (deadlock). Prov- 
ing that no deadlock configuration is reachable from the initial configuration rAw, Aw2, 
requires finding an over-approximation (invariant) J of the reachable configurations, 
such that the conjunction J A J is not satisfiable. 

The basic idea of our method, supported by the D-Finper deadlock detection 
tool [9] for bounded component-based systems, is to compute an invariant straight 
from the interaction formula, without going through costly abstract fixpoint itera- 
tions. The invariants we are looking for are in fact solutions of a system of boolean 
constraints O(), of size linear in the size of I (written in DNF). In our example, 
OW) = Niz12(r V wi) © (s V uj). Finding the (minimal) solutions of this constraint 
can be done, as currently implemented in D-Finper, by exhaustive model enumeration 
using a SAT solver. Here we propose a more efficient solution, which consists in writ- 
ing O(T) in DNF and remove the negative literals from each minterm. In our case, this 
gives the invariant J = (r V s) A Ajai 2(wi V ui) A (r V ui V u2) A (s V wi V W2) and TAA 
is proved unsatisfiable using a SAT solver. 

The main contribution of this paper is the generalization of this invariant generation 
method to the parametric case. To understand the problem, consider the parametric 
system from Fig. 1, in which a Semaphore interacts with n Tasks, where n > 0 is not 
known in advance. The interactions are described by a fragment of first order logic, 
in which the ports are either propositional or monadic predicate symbols, in our case 
T = a^di.b(i)Ve^di. f(i). This logic, called Monadic Interaction Logic (MIL), is also 
used to express the constraints O(I) and compute their solutions. In our case, we obtain 
I= (ry s)A [Yi . wMVu@MJA[rv ai. uD] AsV ai. w@]. As in the bounded case, we can 
give a parametric description of deadlock configurations 4 = [ar V ~di . w(i)] A [~s V 
adi . u@] and prove that J ^ A is unsatisfiable, using the decidability of MIL, based on 
an early small model property result due to Lowenheim [19]. In practice, we avoid the 
model enumeration suggested by this result and check the satisfiability of such queries 
using a decidable theory of sets with cardinality constraints [18], available in the CVC4 
SMT solver [4]. 

The paper is structured as follows: Sect. 1 presents existing results for checking 
deadlock-freedom of bounded systems using invariants, Sect. 2 formalizes the approach 
for computing invariants using MIL, Sect. 3 introduces cardinality constraints for invari- 
ant generation, Sect. 4 presents the integration of the above results within a verification 
technique for parametric systems and Sect. 5 reports on preliminary experiments carried 
out with a prototype tool. Finally, Sect. 6 presents concluding remarks and future work 
directions. For reasons of space, all proofs are given in [12]. 


1 Bounded Component-Based Systems 


A component is a tuple C = (P, S, 59,4), where P = {p,q,r,...} is a finite set of ports, 
S is a finite set of states, so € S is an initial state and 7 C S x Px S is a set of 
transitions written s> s’. To simplify the technical details, we assume there are no two 


. ae a . . Pi P2 
different transitions with the same port, i.e. if sı — s\,5. > s}, € 4 and sı # s2 or 
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s, # s} then pı # p2. In general, this restriction can be lifted, at the cost of cluttering 
the presentation. 

A bounded system S = (C',...,C", I’) consists of a fixed number (n) of components 
Ct = (P*, S*, so‘, 4*) and an interaction formula I’, describing the allowed interactions. 
Since the number of components is known in advance, we write interaction formulae 
using boolean logic over the set of propositional variables BVar = Ue (P* U S*). Here 
we intentionally use the names of states and ports as propositional variables. 

A boolean interaction formula is either a € BVar, fi A f or afi, where f; are 
formulae, for i = 1,2, respectively. We define the usual shorthands fi V fo = a(af, A 
af), fi > h fiy fh fie h = fi> A)A(h > A). A literal is either 
a variable or its negation and a minterm is a conjunction of literals. A formula is in 
disjunctive normal form (DNF) if it is written as \/7_, Nea fij, where fij is a literal. 
A formula is positive if and only if each variable occurs under an even number of 
negations, or, equivalently, its DNF forms contains no negative literals. We assume 
interaction formulae of bounded systems to be always positive. 


A Boolean Valuation $ : BVar — {T, L} maps each propositional variable to either true 
(T) or false (L). We write 8 — fif and only if f = T, when replacing each boolean 
variable a with f(a) in f. We say that 8 is a model of f in this case and write f = g 
for [f] = [e], where [f] = {8 |B H f}. Given two valuations 6; and £2 we write 
By E fo if and only if B;(a) = T implies 62(a) = T, for each variable a € BVar. We 


write f =" g for [fI“ = [gI“, where [fI = {8 € [f] | forall g’ : p’ c Bandp’ + 
B only if £’ ¢ If] is the set of minimal models of f. 


1.1 Execution Semantics of Bounded Systems 


We use l-safe marked Petri Nets to define the set of executions of a bounded system. 
A Petri Net (PN) is a tuple N = (S,7,E), where S is a set of places, T is a set of 
transitions, S OT =@,and E C S xT U T xS is a set of edges. The elements of $ UT 
are called nodes. For a node n, let °n = {meS UT | E(m,n) = 1}, n° z {meS UT | 
E(n,m) = 1} and lift these definitions to sets of nodes, as usual. 

A marking for a PN N = (S,T,E) is a func- 
tonm : S — N. A marked Petri net is a pair 
N = (N,mọ), where mọ is the initial marking of 
N =(S,T, E). We consider that the reader is familiar 
with the standard execution semantics of a marked 
PN. A marking m is reachable in N if and only if 
there exists a sequence of transitions leading from 
Mg to m. We denote by R(N) the set of reachable 
markings of N. A set of markings M is an invari- 
ant of N = (N, mo) if and only if mọ € M and M 
is closed under the transitions of N. A marked PN 
N is 1-safe if m(s) < 1, for each s € S and each 
m € R(N). In the following, we consider only marked PNs that are |-safe. In this case, 
any (necessarily finite) set of reachable markings can be defined by a boolean formula, 


Fig. 2. PN for mutual exclusion 
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which identifies markings with the induced boolean valuations. A marking m is a dead- 
lock if for no transition is enabled in m and let D(N) be the set of deadlocks of N. A 
marked PN N is deadlock-free if and only if RIN) N DIN) = 0. A sufficient condition 
for deadlock freedom is MM D(N) = 0, for some invariant M of N. 

In the rest of this section, we fix a bounded system S = (C', . . . , C", I"), where C* = 
(P*, S‘, so‘, 4‘), for all k € [1,n] and F is a positive boolean formula, over propositional 
variables denoting ports. The set of executions of S is given by the 1-safe marked PN 
Ns = (N, mo), where N = (Ui, S', T, E), mo(s) = 1 if and only if s € {so' | i € [1,n]} 
and T, E are as follows. For each minimal model £ € [J], we have a transition tg € T 
and edges (sj, tg), (tg, s;) € E, for all i € [1,n] such that s; 5 s; € A’ and B(p;) = T. 
Moreover, nothing else is in T or E. 

For example, the marked PN from Fig. 2 describes the set of executions of the 
bounded system from Fig. la. Note that each transition of the PN corresponds to a 
minimal model of the interaction formula l = a^bı VaAb, Ven fi Ved fa, or 
equivalently, to the set of (necessarily positive) literals of some minterm in the DNF 
of T. 


1.2 Proving Deadlock Freedom of Bounded Systems 


A bounded system S is deadlock-free if and only if its corresponding marked PN Ns 
is deadlock-free. In the following, we prove deadlock-freedom of a bounded system, 
by defining a class of invariants that are particularly useful for excluding unreachable 
deadlock markings. 

Given a Petri Net N = (S,T, E), a set of places W C S is called a trap if and only if 
W° C °W. A trap W of N is a marked trap of the marked PN N = (N, mo) if and only if 
Mo(s) = T for some s € W. A minimal marked trap is a marked trap such that none of 
its strict subsets is a marked trap. A marked trap defines an invariant of the PN because 
some place in the trap will always be marked, no matter which transition is fired. The 
trap invariant of N is the least set of markings that mark each trap of N. Clearly, the 
trap invariant of N subsumes the set of reachable markings of N, because the latter is 
the least invariant of N and invariants are closed under intersection!. 


Lemma 1. Given a bounded system S, the boolean formula: 
Trap(Ns) z Vs si | {51,..., Sk} is a marked trap of Ns} 
defines an invariant of Ns. 


Next, we describe a method of computing trap invariants that does not explicitly 
enumerate all the marked traps of a marked PN. First, we consider a trap constraint 
O(I), derived from the interaction formula I, in linear time. By slight abuse of notation, 
we define, for a given port p € P' of the component C’, for some i € [1, n], the pre- and 


. def def P . . . . 
post-state of p in C’ as °p = s and p° = s’, where s> s’ is the unique rule? involving 


' The intersection of two or more invariants is again an invariant. 
2 We have assumed that each port is associated a unique transition rule. 
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pind’, and °p = p° = | if there is no such rule. Assuming that the interaction formula 
is written in DNF as F = vii Ne Pre, we define the trap constraint: 


ar) = AX, (VA * pie) > (v pie’) 


It is not hard to show? that any satisfying valuation of @(’) defines a trap of Ns and, 
moreover, any such trap is defined in this way. We also consider the formula Jnit(S) = 
Vici So’ defining the set of initially marked places of S, and prove the following: 


Lemma 2. Let S be a bounded system with interaction formula T and B be a boolean 
valuation. Then B € [OW)AInit(S)] iff {s | BCs) = T} is a marked trap of Ns. Moreover, 
p € LOM) A MitS iff {s | B(s) = T} is a minimal marked trap of Ns. 


Because O(I) and Jnit(S) are boolean formulae, it is, in principle, possible to com- 
pute the trap invariant Trap(N s) by enumerating the (minimal) models of @() A Init(S) 
and applying the definition from Lemma |. However, model enumeration is inefficient 
and, moreover, does not admit generalization for the parametric case, in which the size 
of the system is unknown. For these reasons, we prefer a computation of the trap invari- 
ant, based on two symbolic transformations of boolean formulae, described next. 

For a formula f we denote by f* the positive formula obtained by deleting all 
negative literals from the DNF of f. We shall call this operation positivation. Second, 


for a positive boolean formula f, we define the dual formula (f)~ recursively on the 
def det 


structure of f, as follows: (ff A fo)” = fi Vio. Ci Vv fe)” = Si A fa” and a~ = a, 
for any a € BVar. Note that f~ is equivalent to the negation of the formula obtained 
from f by substituting each variable a with ~a in f. 

The following theorem gives the main result of this section, the symbolic computa- 
tion of the trap invariant of a bounded system, directly from its interaction formula. 


Theorem 1. For any bounded system S, with interaction formula T, we have: 
Trap(Ns) = ([O(T) A MitS Y 


Intuitively, any satisfying valuation of O(I) A Init(S) defines an initially marked trap 
of Ns and a minimal such valuation defines a minimal such trap (Lemma 2). Instead of 
computing the minimal satisfying valuations by model enumeration, we directly cast 
the above formula in DNF and remove the negative literals. This is essentially because 
the negative literals do not occur in the propositional definition of a set of places*. 
Then the dualization of this positive formula yields the trap invariants in CNF, as a 
conjunction over disjunctions of propositional variables corresponding to the places 
inside a minimal initially marked trap. 

Just as any invariants, trap invariants can be used to prove absence of deadlocks in 
a bounded system. Assuming, as before, that the interaction formula is given in DNF 


3 See [5] for a proof. 

4 Tf the DNF is (p A^ q) V (p A^ >r), the dualization would give (p V q) A (p V =r). The first clause 
corresponds to the trap {p, q} (either p or q is marked), but the second does not directly define 
a trap. However, by first removing the negative literals, we obtain the traps {p, q} and {r}. 
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as I = v% 1 MS Pre, we define the set of deadlock markings of Ns by the formula 
def 


A) = Aes ike —(° pze). This is the set of configurations in which all interactions are 
disabled. With this definition, proving deadlock freedom amounts to proving unsatisfi- 
ability of a boolean formula. 


Corollary 1. A bounded system S with interaction formula T is deadlock-free if the 
boolean formula ((OW) A Init(S)|*)~ A AL) is unsatisfiable. 


2 Parametric Component-Based Systems 


From now on we shall focus on parametric systems, consisting of a fixed set of com- 
ponent types C’,...,C”, such that the number of instances of each type is not known in 
advance. These numbers are given by a function M : [1,2] — N, where M(k) denotes 
the number of components of type C* that are active in the system. To simplify the tech- 
nical presentation of the results, we assume that all instances of a component type are 
created at once, before the system is started’. For the rest of this section, we fix a para- 
metric system S = (C',...,C",M, I"), where each component type C* = (P*, S*, so‘, 4*) 
has the same definition as a component in a bounded system and J is an interaction 
formula, written in the fragment of first order logic, defined next. 


2.1 Monadic Interaction Logic 


For each component type C*, where k € [1,7], we assume a set of index variables Var‘ 
and a set of predicate symbols Pred‘ = P U St, Similar to the bounded case, we use 


state and ports names as monadic (unary) predicate symbols. We also define the sets 
Var = zı Var and Pred = zı Pred‘. Moreover, we consider that Var‘ N Var’ = 0 
and Pred‘ N Pred’ = @, for all 1 < k < £ < n. For simplicity’s sake, we assume that all 
predicate symbols in Pred are of arity one. For component types C*, such that M(x) = 1 
and predicate symbols pr € Pred‘, we shall write pr instead of pr(1), as in the interaction 
formula of the system from Fig. 1b. The syntax of the monadic interaction logic (MIL) 
is given below: 
i,j € Var index variables 


@:=i= j| pri) |b) Ad | 7G, ldi. ġı 


where, for each predicate atom pr(i), if pr € Pred‘ andi € Var‘ then k = £. We use 


def 


the shorthands Vi . ø = Gi. ag) and distinct({,...,im) = At<jeeem Hj = ipo. A 
sentence is a formula in which all variables are in the scope of a quantifier. A formula 
is positive if each predicate symbol occurs under an even number of negations. The 
semantics of MIL is given in terms of structures J = (N, v, 4), where: 


-u2 [1, max;_, M(k)] is the universe of instances, over which variables range, 


> This is not a limitation, since dynamic instance creation can be simulated by considering that 
all instances are initially in a waiting state, which is left as result of an interaction involving a 
designated “spawn” port. 

é Throughout this paper, we consider that A je) ¢; = T if I = 0. 
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— v: Var —> Wis a valuation mapping variables to elements of the universe, 
— ı : Pred — 2" is an interpretation of predicates as subsets of the universe. 


For a structure J = (V, v,z) and a formula @, the satisfaction relation J — ¢ is defined 
as: 


IEL © never T 
I 


i=j &vwi)=xv(j) 
pù evi eLp) LF di.ġı © M, vli — m],) F ġı for some m € [1, M(A)] 
provided that i € Var‘ 


T 


where y[i — m] is the valuation that acts as v, except for i, which is assigned to m. 
Whenever J | 4, we say that Z is a model of ¢. It is known that, if a MIL formula has 
a model, then it has a model with universe of cardinality at most exponential in the size 
(number of symbols) of the formula [19]. This result, due to Lowenheim, is among the 
first decidability results for a fragment of first order logic. 

Structures are partially ordered by pointwise inclusion, i.e. for J; = (QU, v;,¢;), for 
i = 1,2, we write J; C J> iff a (p) € (p), for all p e Pred and Z; C J2 iff £; C Lo 
and J; + J2. As before, we define the sets [¢] = {7 | 7 H } and [ġ]“ = {Z € 
Il | Y.T’ cL — T’ ¢ [d]} of models and minimal models of a MIL formula, 
respectively. Given formulae ¢; and ¢2, we write ¢; = ¢2 for [¢1 ]] = [[¢2]] and ¢; =" ¢2 
for [gi = [2]. 


2.2 Execution Semantics of Parametric Systems 


We consider the interaction formulae of parametric systems to be finite disjunctions of 
formulae of the form below: 


Fi... Fig AA A‘, DG) A ARE Vij yj > Pili) (1) 


where g, We+1,---,We+m are conjunctions of equalities and disequalities involving index 
variables. Intuitively, the formulae (1) state that there are at most £ component instances 
that engage in a multiparty rendez-vous interaction on ports p1(i),..., pe(ie), together 
with a broadcast to the ports pe+1(iz+1),---, Pe+m(it+m) Of the instances that fulfill the 
constraints We41,..-,We+m. Observe that, if m = 0, the above formula corresponds to a 
multiparty (generalized) rendez-vous interaction di, ... dig Ap A ee pj(ij). An exam- 
ple of peer-to-peer rendez-vous is the parametric system from Fig. 1. Another example 
of broadcast is given below. 


Example I. Consider the parametric system obtained from an arbitrary number of 
Worker components (Fig.3), where C' = Worker, Var' = {i, i}, i2, j} and Pred! = 
{a, b, f,u, w}. Any pair of instances can jointly execute the b (begin) action provided 
all others are taking the a (await) action. Any instance can also execute alone the f 
(finish) action. 


The execution semantics of a parametric system S is the marked PN Ns = (N, mo), 
where N = (Uy_, S‘x[1, MO], T, E), mo((so*, i) = 1, for all k € [1,n] andi € [1, M(4)], 
and the sets of transitions T and edges E are defined next. For each minimal model J = 
(QU, v0) € [I", we have a transition tz € T and the edges ((s;, k), tr), (tr, (s;, k)) € E 
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ʻb(ii) ali) 'D(in)_ ali) gi 
Worker(i, ) ali Worker(iz) alin Worker(j) alj) 
> > >| 
bi) blir) bj) 
fla) flo) fO) 
ry ð ð 
fly) fl) FD 


I = [Bii . i Fin AD) ADD) AV). j+ Aj#i >a] V UFO 


Fig. 3. Parametric system with broadcast 


for all i € [1,n] such that s; 3 s; € A’ and k € (pi). Moreover, nothing else is in T 
or E. 

As a remark, unlike in the case of bounded systems, the size of the marked PN Ns, 
that describes the execution semantics of a parametric system S, depends on the maxi- 
mum number of instances of each component type. The definition of the trap invariant 
Trap(Ns) is the same as in the bounded case, except that, in this case, the size of the 
boolean formula depends on the (unbounded) number of instances in the system. The 
challenge, addressed in the following, is to define trap invariants using MIL formulae of 
a fixed size. 


2.3 Computing Parametric Trap Invariants 


To start with, we define the trap constraint of an interaction formula T consisting of a 
finite disjunction of (1) formulae, as a finite conjunction of formulae of the form below: 


Vit... Wie [p A (Vier Pi) V Va Sis Wy A PJ) > 
[Vier PPG) Y Viren Siz Wy A DAG 


where, for a port p € P* of some component type C*, ° p(i) and p(i)* denote the unique 
predicate atoms s(i) and s’(i), such that s + s € 4 is the (unique) transition involving 
p in T*, or L if there is no such rule. 


Example 2. For example, the trap constraint for the parametric (rendez-vous) system in 
Fig. 1b is Vi.[r V w(i)] > [s V u@] A Yi.[s V u@] > [r V u(i)]. Analogously, the trap 
constraint for the parametric (broadcast) system in Fig. 3 is: 


Vi, Vin. [i # i A (wi) V WA) V ALG # i A J # ip Aw > 
[i # i2 A (u(t) V u(i2) V APG + i A j + i2 A wO) 
A Vi. u(i) > w(t) 


We define a translation of MIL formulae into boolean formulae of unbounded size. 
Given a function M : [1,n] — N, the unfolding of a MIL sentence @ is the boolean 
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formula By (¢) obtained by replacing each existential [universal] quantifier Ji . w(i) 
[Vi . Y], for i € Var‘, by a finite disjunction [conjunction] \V? wle/i] [AMY yei, 
where the substitution of the constant £ € M(k) for the variable i is defined recursively 
as usual, except for pr(i)[€/7] = (pr, £), which is a propositional variable. Further, 
we relate structures to boolean valuations of unbounded sizes. For a structure J = 
QU, v,z) we define the boolean valuation 8;((pr, €)) = T if and only if £ € (pr), for 
each predicate symbol pr and each integer constant £. Conversely, for each valuation 


£ of the propositional variables (pr, £), there exists a structure Jg = (U, v,c) such that 


def 


upr) = {£ | B((pr,€)) = T}, for each pr € Pred. The following lemma relates the 
semantics of MIL formulae with that of their boolean unfoldings: 


Lemma 3. Given a MIL sentence ¢ and a function M : [1,n] —> N, the following hold: 


1. for each structure I € [¢ġ], we have Br € [Bm (¢) ] and conversely, for each valua- 
tion P € [Bu (@) J, we have Tg € [o]. 
2. for each structure T € [¢]", we have By € [Bm (¢)]“ and conversely, for each 
valuation B € [Bm (¢) J“, we have Lg € ol. 
Considering the MIL formula Init(S) = z1 Jig . So‘(ix), that defines the set of 
initial configurations of a parametric system S, the following lemma formalizes the 
intuition behind the definition of parametric trap constraints: 


Lemma 4. Let S be a parametric system with interaction formula T and I be a struc- 
ture. Then T — OWL) A Init(S) iff {(s,k) | k € u(s)} is a marked trap of Ns. Moreover, 
I € (OW) A Init(S) iff {(s, k) | k € i(s)} is a minimal marked trap of Ns. 


We are currently left with the task of computing a MIL formula which 
defines the trap invariant Trap(Ns) of a parametric component-based system S = 
(C',...,C",M, TY. The difficulty lies in the fact that the size of Ns and thus, that of the 
boolean formula Trap(Ns) depends on the number M(k) of instances of each compo- 
nent type k € [1, n]. As we aim at computing an invariant able to prove safety properties, 
such as deadlock freedom, independently of how many components are present in the 
system, we must define the trap invariant using a formula depending exclusively on I’, 
i.e. not on M. 

Observe first that Trap(Ns) can be equivalently defined using only the minimal 
marked traps of Ns, which, by Lemma 4, are exactly the sets {(s, k) | k € u(s)}, defined 
by some structure (U, v,) € [O(T) A Init(S)]. Assuming that the set of structures 
[O(T) A Init(S)]’, or an over-approximation of it, can be defined by a positive MIL 
formula, the trap invariant is defined using a generalization of boolean dualisation to 
predicate logic, defined recursively, as follows: 


G= Sai=f CVO) SO Ad Gib Yig pO = pH 
r a~ def. è ~ def i bis š ~ def Š a 
(=f =i=j (Apy =O Vb Wi. gi) =di.¢ı 
The crux of the method is the ability of defining, given an arbitrary MIL formula ¢, a 
positive MIL formula ¢® that preserve its minimal models, formally ¢ =" 6°. Because 
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of quantification over unbounded domains, a MIL formula ¢ does not have a disjunc- 
tive normal form and thus one cannot define ¢® by simply deleting the negative literals 
in DNF, as was done for the definition of the positivation operation (.)*, in the propo- 
sitional case. For now we assume that the transformation (.)® of monadic predicate 
formulae into positive formulae preserving minimal models is defined (a detailed pre- 
sentation of this step is given next in Sect. 3) and close this section with a parametric 
counterpart of Theorem 1. 


Theorem 2. For any parametric system S = (C',...,C",M,I°), we have 


Trap(Ns) = Bu (((OW) A Init(S))®) ) 


3 Cardinality Constraints 


This section is concerned with the definition of a positivation operator (.)® for MIL sen- 
tences, whose only requirements are that ¢® is positive and ¢ =“ ¢®. For this purpose, 
we use a logic of quantifier-free boolean cardinality constraints [4,18] as an interme- 
diate language, on which the positive formulae are defined. The translation of MIL into 
cardinality constraints is done by an equivalence-preserving quantifier elimination pro- 
cedure, described in Sect. 3.1. As a byproduct, since the satisfiability of quantifier-free 
cardinality constraints is NP-complete [18] and integrated with SMT [4], we obtain a 
practical decision procedure for MIL that does not use model enumeration, as suggested 
by the small model property [19]. Finally, the definition of a positive MIL formula from 
a boolean combination of quantifier-free cardinality constraints is given in Sect. 3.2. 

We start by giving the definition of cardinality constraints. Given the set of monadic 
predicate symbols Pred, a boolean term is generated by the syntax: 


t:=prePred| at lA Abl|t Vt 


When there is no risk of confusion, we borrow the terminology of propositional logic 
and say that a term is in DNF if it is a disjunction of conjunctions (minterms). We also 
write t; — h if and only if the implication is valid when tı and f are interpreted as 
boolean formulae, with each predicate symbol viewed as a propositional variable. Two 
boolean terms ¢; and t are said to be compatible if and only if t; A t) is satisfiable, when 
viewed as a boolean formula. 

For a boolean term ¢ and a first-order variable i € Var, we define the shorthand f(7) 
recursively, as (=t))(i) = =1)(i), ti AD = tO Ab and t V (i) = AOV h(i). 
Given a positive integer n € N and ta boolean term, we define the following cardinality 
constraints, by MIL formulae: 


def 


il> n Ž Aly... di, . distinet(it,...,in) A A” KC) <n Žan +1) 


We shall further use cardinality constraints with n = oo, by defining |t| > oo = 1 and 
|| < co © T. The intuitive semantics of cardinality constraints is formally defined in 
terms of structures J = (M, v,v) by the semantics of monadic predicate logic, given in 
the previous. For instance, |p A q| > 1 means that the intersection of the sets p and q is 
not empty, whereas |—p| < 0 means that p contains all elements from the universe. 
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3.1 Quantifier Elimination 


Given a sentence ¢, written in MIL, we build an equivalent boolean combination of car- 
dinality constraints qe(@), using quantifier elimination. We describe the elimination of 
a single existential quantifier and the generalization to several existential or universal 
quantifiers is immediate. Assume that @ = di1 . Vreg Weli,.--,im), Where K is a finite 
set of indices and, for each k € K, yy is a quantifier-free conjunction of atomic propo- 
sitions of the form i; = ig, pr(i;) and their negations, for some j, € € [1,m]. We write, 
equivalently, ø = Vreg Yk A Jii . &(i1,..-,im), Where pg does not contain occurrences 
of i; and & is a conjunction of literals of the form pr(i,), =pr(i1), i1 = ij and i; = ij, 
for some j € [2, m]. For each k € K, we distinguish the following cases: 


1. if i; = i; is a consequence of 6z, for some j > 1, let qe(di1 . 6x) = Oklij/i1]. 
2. else, 0k = /\ jez, th = ij A tk(i1) for some J, G [2, m] and boolean term tg, and let: 


qedir . %) = Azez [distincti jes) A A jes tej] > Itel = lI + 1 
ge(d) = Vrek Pk A qei . 0x) 


Universal quantification is dealt with using the duality qe(Yi . y) z ~[aqe(di; . ~y). For 
a prenex formula ¢ = Qyi,...Qii1 . Y, where Q1,..., Qn € {d, Y} and y is quantifier- 
free, we define, recursively qe(¢) = qe(Qnin . qe(Qn-1in-1 -.- Qit, . W)). It is easy to 
see that, if @ is a sentence, qe(¢) is a boolean combination of cardinality constraints. 
The correctness of the construction is a consequence of the following lemma: 


Lemma 5. Given a MIL formula @ = Qnin... Qiii . Y, where Q1,..., Qn € {Y,d} and 
y is a quantifier-free conjunction of equality and predicate atoms, we have ¢ = qe(¢). 


Example 3. (contd. from Example 2) Below we show the results of quantifier elimina- 
tion applied to the conjunction O(I) A Init(S) for the system in Fig. 1b: 


(ar Anas A|wA aul < 0A luA awl < 0A 1 < Iw) Vv 
(ar A |wA aul < OA |nw| <OA1 <|W)V(sAr)V (SA [Aw] < OA 1 < Iw) Vv 
(As A jau] < 0A Ju A awl < 0A 1 < |W) v (ul < OA [Aw < OA 1 < [w)). 


Similarly, for the system in Fig. 3, we obtain the following cardinality constraints: 


(3 < [wl A lu A >w] < 0) v (2 < [wl] A |w A au] < 1A lu Aaw] < 0) v 
(ul < LA Jau A nv] < 0A lu Aaw] < 0A 1< w) v (Iw A au] < 0 A [UA =w] < 0A 1 < jw). 


3.2 Building Positive Formulae that Preserve Minimal Models 


Let ġ be a MIL formula, not necessarily positive. We shall build a positive formula 
¢®, such that @ =" 6°. By the result of the last section, ¢ is equivalent to a boolean 
combination of cardinality constraints qe(¢), obtained by quantifier elimination. Thus 
we assume w.l.o.g. that the DNF of ¢ is a disjunction of conjunctions of the form 
Aiex ltl 2 € A Ajeu lz < uj, for some sets of indices L, U and some positive inte- 
gers {€j}iex and {uj} jeu- 
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For a boolean combination of cardinality constraints y, we denote by P(Y) the set 
of predicate symbols that occur in a boolean term of y and by P*(w) (P~ (y)) the set of 
predicate symbols that occur under an even (odd) number of negations in y. The follow- 
ing proposition allows to restrict the form of ø even further, without losing generality: 


Proposition 1. Given MIL formulae ¢, and $2, for any positivation operator (.)®, the 
following hold: 


1. (b1 V G2)” =| pi? V bo", 
2. (b1 A p2)? =| 61° A QÈ, provided that P(ġ1) N P(¢2) = 0. 


From now on, we assume that ¢ is a conjunction of cardinality constraints that cannot 
be split as = ¢1 A ¢2, such that P(¢1) N P(¢2) = 0. 

Let us consider a cardinality constraint |t| > £ that occurs in ¢. Given a set P 
of predicate symbols, for a set of predicates S c P, the complete boolean minterm 


corresponding to S with respect to P is fẹ = Apes P ^ Apep\s “p. Moreover, let 
def 


S, = {S ¢ P(d) | ts — t} be the set of sets S of predicate symbols for which the 
complete minterm ts implies t. Finally, each cardinality constraint |t| > ¢ is replaced 
by the equivalent disjunction’, in which each boolean term is complete with respect to 


P(g): 


(] >= Vi N Je] > ls | for some constants {fs € N}ses, such that 3 fs = £) 
SES, SES, 


Note that because any two complete minterms ts and tr, for S + T, are incompatible, 
then necessarily |ts V tr| = |ts| + |tr|. Thus |ts V tr| = £ if and only if there exist 
L1, l2 € N such that 44 + € = £ and |ts| > £1, |tr| > 5, respectively. 

Notice that, restricting the sets of predicates in S, to subsets of P(¢), instead of 
the entire set of predicates, allows to apply Proposition 1 and reduce the number of 
complete minterm to be considered. That is, whenever possible, we write each minterm 
Nier ltil 2 €: 4 A jeu le < uj in the DNF of das, ^.. . A Yr, such that PW) NP(W;) = 0 
for all 1 < i < j < k. In practice, this optimisation turns out to be quite effective, as 
shown by the small execution times of our test cases, reported in Sect. 5. 

The second step is building, for each conjunction C = Affs < Jee] A a < us | 
S C P(¢)}8, as above, a positive formula C®, that preserves its set of minimal models 
[C]“. The generalization to arbitrary boolean combinations of cardinality constraints 
is a direct consequence of Proposition 1. Let L” ($) (resp. L~ (¢)) be the set of positive 
boolean combinations of predicate symbols p € P*(¢) (resp. =p, where p € P~(¢)). 
Further, for a complete minterm f¢, we write fg . (t$) for the conjunction of the positive 
(negative) literals in fg. Then, we define: 


def 


C E Aqil > Epot ts IT ELOA A {It < Epo- us IT ELO) 


It is not hard to see that C® is a positive MIL formula, because: 


7 The constraints |t| < u are dealt with as -(|¢] > u + 1). 
8 Missing lower bounds £s are replaced with 0 and missing upper bounds us with oo. 
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— for each t € L£*(d), we have |r| > k = Fi, ... Jig . distinct(i,,...,i%) A Nye T(J) and 
— for each t € L ($), we have |r| < k = Vi,...Vixg) . distinct(i),...,i%+1) —> 
Vint 77(i)). 


The following lemma proves that the above definition meets the second requirement of 
positivation operators, concerning the preservation of minimal models. 


Lemma 6. Given P a finite set of monadic predicate symbols, {Es € N}scp and {us € 
NU {co}}scp sets of constants, for any conjunction C = /\{€s < lel A lel < us |S CP}, 
we have C =" C®. 


Example 4 (contd. from Example 3). 

Consider the first minterm of the DNF of the cardinality constraint obtained by 
quantifier elimination in Example 3, from the system in Fig. 1b. The result of positiva- 
tion for this minterm is given below: 


(arans Alw Anul < OAlu anaw <0A1 < |wh)? =1<luaw 


Intuitively, the negative literals sr and ~s may safely disappear, because no minimal 
model will assign r or s to true. Further, the constraints |w A ~u| < 0 and |u A aw| < 0 
are equivalent to the fact that, in any structure J = (U, v,c), we must have (u) = (w). 
Finally, because |w| > 1, then necessarily |u A w| > 1. 

Similarly, the result of positivation applied to the second conjunct of the DNF car- 
dinality constraint corresponding to the system in Fig. 3 is given below: 


(2 < |w] A |w Anu] < 1A u A ~w] <0)? =2< |w A1 < ju ^Aw] 


Here, the number of elements in w is at least 2 and, in any structure J = (U,v,0), 
we must have (u) C ew) and at most one element in (w) \ (u). Consequently, the 
intersection of the sets (u) and (w) must contain at least one element, i.e. |u A w| > 1. 


4 Proving Deadlock Freedom of Parametric Systems 


We have gathered all the ingredients necessary for checking deadlock freedom of para- 
metric systems, using our method based on trap invariant generation (Fig. 4). In par- 
ticular, we derive a trap constraint O(I) directly from the interaction formula I’, both 
of which are written in MIL. Second, we compute a positive formula that preserves the 
set of minimal models of O(I) A Init(S), by first converting the MIL formula into a 
quantifier-free cardinality constraint, using quantifier elimination, and deriving a posi- 
tive MIL formula from the latter. 

The conjunction between the dual of this positive formula and the formula A(/) that 
defines the deadlock states is then checked for satisfiability. Formally, given a paramet- 
ric system S, with an interaction formula F written in the form (1), the MIL formula 
characterizing the deadlock states of the system is the following: 


def 


AT) = Vis... Wie p > | Via ii) V VERa ij yj A") 


We state a sufficient verification condition for deadlock freedom in the parametric case: 
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Monadic Interaction Logic Cardinality Constraints 


œ] OW) AInit(S) ee O(L) A Init(S) 


(trap constraints) 


y positivation 


LOC) A Init(S)]® 


S dual 
(CUNOWE —> unsat / 
: F smt-checking| deadlock-free 
(trap invariant) (A ) 
qe (CVC4) 
~ AT) m A) — sat / 
(deadlock states) (deadlock-freedom condition) potential deadlock 


Fig. 4. Verification of parametric component-based systems 


Corollary 2. A parametric system S = (C', . . . ,C", M, T) is deadlock-free if 
(COT) A Init(S))®) AAT) > L 


The satisfiability check is carried out using the conversion to cardinality constraints 
via quantifier elimination Sect. 3.1 and an effective set theory solver for cardinality 
constraints, implemented in the CVC4 SMT solver [6]. 


5 Experimental Results 


To assess our method for proving deadlock freedom of parametric component-based 
system, we ran a number of experiments on systems with a small numbers of rather 
simple component types, but with nontrivial interaction patterns, given by MIL for- 
mulae. The task-sem i/n examples, i = 1,2,3, are generalizations of the parametric 
Task-Semaphore example depicted in Fig. 1b, in which n Tasks synchronize using n 
Semaphores, such that i Tasks interact with a single Semaphore at once, in a multiparty 
rendez-vous. In a similar vein, the broadcast i/n examples, i = 2, 3 are generalizations 
of the system in Fig. 3, in which i out of n Workers engage in rendez-vous on the b 
port, whereas all the other stay idle—here idling is modeled as a broadcast on the a 
ports. Finally, in the sync i/n examples, i = 1, 2,3, we consider systems composed of n 
Workers (Fig. 1b) such that either i out of n instances simultaneously interact on the b 
ports, or all interact on the f ports. Notice that, for i = 2,3, these systems have a dead- 
lock if and only if #0 mod i. This is because, if n =m mod i, for some 0 < m < i, 
there will be be m instances that cannot synchronize on their b port, in order to move 
from w to u, in order to engage in the f broadcast. 

All experiments were carried out on a Intel(R) Xeon(R) CPU @ 2.00 GHz virtual 
machine with 4GB of RAM. Table | shows separately the times needed to generate 
the proof obligations (trap invariants and deadlock states) from the interaction formulae 
and the times needed by CVC4 1.7 to show unsatisfiabilty or come up with a model. All 
systems considered, for which deadlock freedom could not be shown using our method, 
have a real deadlock scenario that manifests only under certain modulo constraints on 
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Table 1. Benchmarks 
example interaction formula t-gen | t-smt |result 
task-sem I/n |3iðjı. ali) A b(ji) V Ai. eM A fja) 22 ms |20 ms|unsat 
task-sem 2/n | ij) Ajo. jı # j2 A ali) A b(ji) A bGj2) V 34 ms |40 ms | unsat 


FA rp. ji # j2 A eki) A fG) A fO) 
task-sem 3/n | dij) Ajo js. distinct( j1, j2, j3) A a(i) A b(j1) A b(j2) A b(j3) V 73 ms |40 ms|unsat 
Ais jf Aj2Aj3. distinct(j1, j2, j3) A e@ A fli) A fUi2) A FO) 


broadcast 2/n| Ai, diz.i) # i2 A b(i1) A b(i2) A 14 ms |20 ms| unsat 
Vi J FUN FLV af) VULO 
broadcast 3/n| Ai, diz i3.distinct(i,, i2, i3) A b(i1) A b(i2) A b(i3) A 409 ms |20 ms | unsat 
ViFFUANS ERAT FIZ >al) V ALO 
sync 1/n gibi) V Vi.f@ 5ms |20 ms |unsat 
sync 2/n Ai dig. i) # i2 A Di) A b(i2) V Vif 7ms |50ms| sat 
sync 3/n gi Aig iz. .distinct(i,, i2, i3) A b(i1) A b(i2) A b(i3) V Vi. f@ 11 ms |40ms| sat 


the number n of instances. These constraints cannot be captured by MIL formulae, or, 
equivalently by cardinality constraints, and would require cardinality constraints of the 
form |t| =n mod m, for some constants n,m € N. 


6 Conclusions 


This work is part of a lasting research program on BIP linking two work directions: 
(1) recent work on modeling architectures using interaction logics, and (2) older work 
on verification by using invariants. Its rationale is to overcome as much as possible 
complexity and undecidability issues by proposing methods which are adequate for the 
verification of essential system properties. 

The presented results are applicable to a large class of architectures characterized 
by the MIL. A key technical result is the translation of MIL formulas into cardinality 
constraints. This allows on the one hand the computation of the MIL formula character- 
izing the minimal trap invariant. On the other hand, it provides a decision procedure for 
MIL, that leverages from recent advances in SMT, implemented in the CVC4 solver [6]. 
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Abstract. Reasoning about the correctness of parallel and distributed 
systems requires automated tools. By now, the mCRL2 toolset and lan- 
guage have been developed over a course of more than fifteen years. In 
this paper, we report on the progress and advancements over the past six 
years. Firstly, the mCRL2 language has been extended to support the 
modelling of probabilistic behaviour. Furthermore, the usability has been 
improved with the addition of refinement checking, counterexample gen- 
eration and a user-friendly GUI. Finally, several performance improve- 
ments have been made in the treatment of behavioural equivalences. 
Besides the changes to the toolset itself, we cover recent applications 
of mCRL2 in software product line engineering and the use of domain 
specific languages (DSLs). 


1 Introduction 


Parallel programs and distributed systems become increasingly common. This is 
driven by the fact that Dennard’s scaling theory [17], stating that every new pro- 
cessor core is expected to provide a performance gain over older cores, does not 
hold any more, and instead performance is to be gained from exploiting multiple 
cores. Consequently, distributed system paradigms such as cloud computing have 
grown popular. However, designing parallel and distributed systems correctly is 
notoriously difficult. Unfortunately, it is all too common to observe flaws such 
as data loss and hanging systems. Although these may be acceptable for many 
non-critical applications, the occasional hiccup may be impermissible for critical 
applications, e.g., when giving rise to increased safety risks or financial loss. 
The mCRL2 toolset is designed to reason about concurrent and distributed 
systems. Its language [27] is based on a rich, ACP-style process algebra and 
has an axiomatic view on processes. The data theory is rooted in the theory of 
abstract data types (ADTs). The toolset consists of over sixty tools supporting 
visualisation, simulation, minimisation and model checking of complex systems. 


© The Author(s) 2019 
T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 21-39, 2019. 
https://doi.org/10.1007/978-3-030-17465-1_2 
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In this paper, we present an overview of the mCRL2 toolset in general, 
focussing on the developments from the past six years. We first present a cursory 
overview of the mCRL2 language, and discuss the recent addition of support for 
modelling and analysing probabilistic processes. 

Behavioural equivalences such as strong and branching bisimulation are used 
to reduce and compare state spaces of complex systems. Recently, the complex- 
ity of branching bisimulation has been significantly improved from O(mn) to 
O(m(log |Act| + logn)), where m is the number of transitions, n the number 
of states, and Act the set of actions. This was achieved by implementing the 
new algorithm by Groote et al. [24]. Additionally, support for checking (weak) 
failures refinement and failures divergence refinement has been added. 

Model checking in mCRL2 is based on parameterised boolean equation sys- 
tems (PBES) [33] that combine information from a given mCRL2 specification 
and a property in the modal p-calculus. Solving the PBES answers the encoded 
model checking problem. Recent developments include improved static analysis 
of PBESs using liveness analysis, and solving PBESs for infinite-state systems 
using symbolic quotienting algorithms and abstraction. One of the major features 
recently introduced is the ability to generate comprehensive counterexamples in 
the form of a subgraph of the original system. 

To aid novice users of mCRL2, an alternative graphical user-interface (GUI), 
mcr12ide, has been added, that contains a text editor to create mCRL2 specifica- 
tions, and provides access to the core functionality of mCRL2 without requiring 
the user to know the interface of each of the sixty tools. The use of the language 
and tools is illustrated by means of a selection of case studies conducted with 
mCRL2. We focus on the application of the tools as a verification back-end for 
domain specific languages (DSLs), and the verification of software product lines. 

The mCRL2 toolset can be downloaded from the website www.mcrl2.org. 
This includes binaries as well as source code packages’. To promote external 
contributions, the source code of mCRL2 and the corresponding issue tracker 
have been moved to GitHub.? The mCRL2 toolset is open source under the 
permissive Boost license, that allows free use for any purpose. Technical docu- 
mentation and a user manual of the mCRL2 toolset, including a tutorial, can be 
found on the website. An extensive introduction to the mCRL2 language can be 
found in the textbook Modeling and analysis of communicating systems [27]. 

The rest of the paper is structured as follows. Section 2 introduces the basics 
of the mCRL2 language and Sect. 3 its probabilistic extension. In Sect. 4, we dis- 
cuss several new and improved tools for various behavioural relations. Section 5 
gives an overview of novel analysis techniques for PBESs, while Sect. 6 introduces 
mCRL2’s improved GUI and Sect. 7 discusses a number of applications. Related 
work is discussed in Sects. 8 and 9 presents a conclusion and future plans. 


1 The source code is also archived on https://doi-org/10.5281/zenodo.2555054. 
? https: //github.com/mCRL2org/mCRL2. 
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2 The mCRL2 Language and Workflow 


The behavioural specification language mCRL2 [27] is the successor of CRL 
(micro Common Representation Language [28]) that was in turn a response to 
a language called CRL (Common Representation Language) that became so 
complex that it would not serve a useful purpose. 


sort Content = struct bad_data | data, | dataz ; 
act read, deliver, get, put, pass_on : Content ; 


proc Filter = 
X .-Content Jet(c) (c ~ bad_data — Filter o put(c) - Filter) ; 
Queue(q : List(Content)) = 


Detinied read(c) ` Queue(c > q) + 
q % || — deliver (rhead(q))- Queue(rtail(q)) ; 


init V { get, deliver,pass_on} (Te teelnead spass ar (Filter || Queue([]))) ; 


Fig. 1. A filter process communicating with an infinite queue in mCRL2. 


The languages uCRL and mCRL2 are quite similar combinations of process 
algebra in the style of ACP [8] together with equational abstract data types [19]. 
A typical example illustrating most of the language features of mCRL2 is given 
in Fig. 1, which shows a filter process (Filter) that iteratively reads data via an 
action get and forwards it to a queue using the action put if the data is not 
bad. The queue (Queue) is infinitely sized, reading data via the action read and 
delivering data via the action deliver. The processes are put in parallel using the 
parallel operator ||. The actions put and read are forced to synchronise into the 
action pass_on using the communication operator I and the allow operator V. 

The language mCRL2 only contains a minimal set of primitives to express 
behaviour, but this set is well chosen such that behaviour of communicating sys- 
tems can be easily expressed. Both wCRL and mCRL2 allow to express systems 
with time, using positive real time tags to indicate when an action takes place. 
Recently the possibility has been added to express probabilistic behaviour in 
mCRL2, which will be explained in Sect. 3. 

The differences between uCRL and mCRL2 are minor but significant. In 
mCRL2 the if-then-else is written as c>poq (was pacoq). mCRL2 allows for 
multi-actions, e.g., a|b|c expresses that the actions a, b and c happen at the 
same time. mCRL2 does not allow multiple actions with the same time tag 
to happen consecutively (wCRL does, as do most other process specification 
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formalisms with time). Finally, mCRL2 has built-in standard datatypes, mecha- 
nisms to allow to specify datatypes far more compactly, and it allows for function 
datatypes, including lambda expressions, as well as arbitrary sets and bags. 

The initial purpose of CRL was to have a mathematical language to model 
realistic protocols and distributed systems of which the correctness could be 
proven manually using process algebraic axioms and rules, as well as the equa- 
tions for the equational data types. The result of this is that mCRL2 is equipped 
with a nice fundamental theory as well as highly effective proof methods [29,30], 
which have been used, for instance, to provide a concise, computer checked proof 
of the correctness of Tanenbaum’s most complex sliding window protocol [1]. 

When the language jsCRL began to be used for specifying actual systems [20], 
it became obvious that such behavioural specifications are too large to analyse by 
hand and tools were required, a toolset was developed. It also became clear that 
specifications of actual systems are hard to give without flaws, and verification 
is needed to eliminate those flaws. In the early days verification had the form of 
proving that an implementation and a specification were (branching) bisimilar. 

Often it is more convenient to prove properties about aspects of the 
behaviour. For this purpose mCRL2 was extended with a modal logic, in the 
form of the modal p-calculus with data and time. A typical example of a for- 
mula in modal logic is the following: 


vX(n:N = 0).V¥m : N.[enter(m)|X(n+m)A 
Ym : N.[extract(m)](m < n A X(n—m)) 


which says that the amount extracted using actions extract can never exceed 
the cumulative amount entered via the action enter. The modal p-calculus with 
data is far more expressive than languages such as LTL and CTL*, which can 
be mapped into it [13]. 


1tsconvert 
tat ti 
state te Si ion itscônparë 
linearisation P LTS > yes/no 
mcrl221ps ae 


mCRL2 ~> LPS 


lps2pbes 


pbessolve yes/no 
r 
PBES (+evidence LPS) 


p-calculus formula 


Fig. 2. The mCRL2 model checking workflow 


Verification of modal formulae is performed through transformations to lin- 
ear process specifications (LPSs) and parameterised boolean equation systems 
(PBESs) [25,33]. See Fig. 2 for the typical model checking workflow. An LPS 
is a process in normal form, where all state behaviour is translated into data 
parameters. An LPS essentially consists of a set of condition-action-effect rules 
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saying which action can be done in which state, and as such is a symbolic rep- 
resentation of a state space. A PBES is constructed using a modal formula and 
a linear process. It consists of a parameterised sequence of boolean fixed point 
equations. A PBES can be solved to obtain an answer to the question whether the 
mCRL2 specification satisfies the supplied formula. For more details on PBESs 
and the generation of evidence, refer to Sect. 5. 

Whereas an LPS is a symbolic description of the behaviour of a system, a 
labelled transition system (LTS), makes this behaviour explicit. An LTS can be 
defined in the context of a set of action labels. The LTS itself consists of a set 
of states, an initial state, and a transition relation between states where each 
transition is labelled by an action. The mCRL2 toolset contains the lps21ts 
tool to obtain the LTS from a given LPS by means of state space exploration. 
The resulting LTS contains all reachable states of this LPS and the transition 
relation defining the possible actions in each state. The mCRL2 toolset provides 
tools for visualising and reducing LTSs and also for comparing LTSs in a pairwise 
manner. For more details on reducing and comparing LTSs, refer to Sect. 4. 


3 Probabilistic Extensions to mCRL2 


A recent addition to the mCRL2 language is the possibility to specify proba- 
bilistic processes using the construct dist x:D| dist(«) ].p(x) which behaves as 
the process p(x) with probability dist(a). The distribution dist may be discrete 
or continuous. For example, a process describing a light bulb that fails according 
to a negative exponential distribution of rate is described as 


dist r:R.[if(r>0, Ae~>", 0) ]. failer 


where fail<r is the notation for the action fail that takes place at time r. 

The modelling of probabilistic behaviour with the probabilistic extension of 
mCRL2 can be rather insightful as advocated in [32]. There it is illustrated for 
the Monty Hall problem and the so-called “problem of the lost boarding pass” 
how strong probabilistic bisimulation and reduction modulo probabilistic weak 
trace equivalence can be applied to visualise the probabilistic LTS (PLTS) of the 
underlying probabilistic process as well as to establish the probability of reaching 
a target state (or set of states). We illustrate this by providing the description 
and state space of the Monty Hall problem here. 

In the Monty Hall problem, there are three doors, one of which is hiding a 
prize. A player can select a door. Then one of the remaining doors that does not 
hide the prize is opened. The player can then decide to select the other door. 
If he does so, he will get the prize with probability 3. The action prize(true) 
indicates that a prize is won. The action prize(false) is an indication that no 
prize is obtained. A possible model in mCRL2 is given below. In this model the 
player switches doors. So, the prize is won if the initially selected door was not 
the door with the prize. 
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prize(false) prize(true) 


Fig. 3. The non-reduced and reduced state space of the Monty Hall problem. At the 
left the label v abbreviates prize(true) and x stands for prize(false) 


sort Doors = struct door, | door | doors; 
init dist door_with_prize : Doors [1/3]. 
dist initially_selected_door : Doors [1/3]. 
prize(initially_selected_door % door_with_prize)-6 ; 


The generated state space for this model is given in Fig.3 at the left. From 
probabilistic mCRL2 processes probabilistic transition systems can be generated, 
which can be reduced modulo strong probabilistic bisimulation [26] (see the next 
section). The reduced transition system is provided at the right, and clearly 
shows that the prize is won with probability Z, 

Moreover, modal mu-calculus formulae yielding a probability, i.e. a real num- 
ber, can be evaluated invoking probabilistic counterparts of the central tools in 
the toolset. For the Monty Hall model the modal formula (prize(true)) true 
will evaluate to the probability 2, The tool that verified this modal formula is 
presented in [10]. Although the initial results are promising, the semantic and 
axiomatic underpinning of the process theory for probabilities is demanding. 


4 Behavioural Relations 


Given two LTSs, the 1tscompare tool can check whether they are related accord- 
ing to one of a number of equivalence and refinement relations. Additionally, the 
ltsconvert tool can reduce a given LTS modulo an equivalence relation. In the 
following subsections the recently added implementations of several equivalence 
and refinement relations are described. 


4.1 Equivalences 


The 1tscompare tool can check simulation equivalence, and (weak) trace equiv- 
alence between LTSs. In the latest release an algorithm for checking ready sim- 
ulation was implemented and integrated into the toolset [23]. Regarding bisimu- 
lations, the tool can furthermore check strong, branching and weak bisimulation 
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between LTSs. The latter two are sensitive to so-called internal behaviour, rep- 
resented by the action T. Divergence-preserving variants of these bisimulations 
are supported, which take the ability to perform infinite sequences of internal 
behaviour into account. The above mentioned equivalences can also be used by 
the ltsconvert tool. 

Recently, the Groote/Jansen/Keiren/Wijs algorithm (GJKW) for branching 
bisimulation [24], with complexity O(m(log|Act| + logn)), was implemented. 
When tested in practice, it frequently demonstrates performance improvements 
by a factor of 10, and occasionally by a factor of 100 over the previous algorithm 
by Groote and Vaandrager [31]. 

The improved complexity is the result of combining the process the smaller 
half principle [35] with the key observations made by Groote and Vaandrager 
regarding internal transitions [31]. GJKW uses partition refinement to identify 
all classes of equivalent states. Repeatedly, one class (or block) B is selected to 
be the so-called splitter, and each block B’ is checked for the reachability of B, 
where internal behaviour should be skipped over. In case B is reachable from 
some states in B’ but not from others, B’ needs to be split into two subblocks, 
separating the states from which B can and cannot be reached. Whenever a 
fixed-point is reached, the obtained partition defines the equivalence relation. 

GJKW applies process the smaller half in two ways. First of all, it is ensured 
that each time a state s is part of a splitter B, the size of B, in terms of number 
of states, is at most half the size of the previous splitter in which s resided. To do 
this, blocks are partitioned in constellations. A block is selected as splitter iff its 
size is at most half the number of states in the constellation in which it resides. 
When a splitter is selected, it is moved into its own, new, constellation, and 
when a block is split, the resulting subblocks remain in the same constellation. 

Second of all, it has to be ensured that splitting a block B’ takes time pro- 
portional to the smallest resulting subblock. To achieve this, two state selection 
procedures are executed in lockstep, one identifying the states in B’ that can 
reach the splitter, and one detecting the other states. Once one of these proce- 
dures has identified all its states, those states can be split off from B’. 

Reachability checking is performed efficiently by using the notion of bottom 
state [31], which is a state that has no outgoing internal transitions leading to 
a state in the same block. It suffices to check whether any bottom state in B’ 
can reach B. Hence, it is crucial that for each block, the set of bottom states is 
maintained during execution of the algorithm. 

GJKW is very complicated due to the amount of book keeping needed to 
achieve the complexity. Among others, a data structure by Valmari, called refin- 
able partition [46] is used, together with three copies of all transitions, structured 
in different ways to allow fast retrieval in the various stages of the algorithm. 

Besides checking for branching bisimulation, GJKW is used as a basis for 
checking strong bisimulation (in which case it corresponds to the Paige-Tarjan 
algorithm [41]) and as a preprocessing step for checking weak bisimulation. 

For the support of the analysis of probabilistic systems, a number of prelim- 
inary extensions have been made to the mCRL2 toolset. In particular, a new 
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algorithm has been added to reduce PLTSs — containing both non-deterministic 
and probabilistic choice [44] — modulo strong probabilistic bisimulation. This 
new Paige-Tarjan style algorithm, called GRV [26] and implemented in the tool 
ltspbisim, improves upon the complexity of the best known algorithm so far 
by Baier et al. [2]. The GRV algorithm was inspired by work on lumping of 
Markov Chains by Valmari and Franceschinis [47] to limit the number of times a 
probabilistic transition needs to be sorted. Under the assumption of a bounded 
fan-out for probabilistic states, the time complexity of GRV is O(n, log nq) with 
Np equal to the number of probabilistic transitions and nq being the number of 
non-deterministic states in a PLTS. 


4.2 Refinement 


In model checking there is typically a single model on which properties, defined in 
another language, are verified. An alternative approach that can be employed is 
refinement checking. Here, the correctness of the model is verified by establishing 
a refinement relation between an implementation LTS and a specification LTS. 
The chosen refinement relation must be strong enough to preserve the desired 
properties of the model, but also weak enough to allow many valid implementa- 
tions. 

For refinement relations the 1tscompare tool can check the asymmetric vari- 
ants of simulation, ready simulation and (weak) trace equivalence between LTSs. 
In the latest release, several algorithms have been added to check (weak) trace, 
(weak) failures and failures-divergences refinement relations based on the algo- 
rithms introduced in [48]. We remark that weak failures refinement is known 
as stable failures refinement in the literature. Several improvements have been 
made to the reference algorithms and the resulting implementation has been 
successfully used in practice, as described in Sect. 7.1. 

The newly introduced algorithms are based on the notion of antichains. These 
algorithms try to find a witness to show that no refinement relation exists. The 
antichain data structure keeps track of the explored part of the state space and 
assists in pruning other parts based on an ordering. If no refinement relation 
exists, the tool provides a counterexample trace to a violating state. To further 
speed up refinement checking, the tool applies divergence-preserving branching 
bisimulation reduction as a preprocessing step. 


5 Model Checking 


Behavioural properties can be specified in a first-order extension of the modal 
p-calculus. The problem of deciding whether a p-calculus property holds for a 
given mCRL2 specification is converted to a problem of (partially) solving a 
PBES. Such an equation system consists of a sequence of parameterised fix- 
point equations of the form (o X(d1:D,,...,dy:Dn) = ¢), where ø is either a 
least (u) or greatest (v) fixpoint, X is an n-ary typed second-order recursion 
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variable, each d; is a parameter of type D; and ¢ is a predicate formula (tech- 
nically, a first-order formula with second-order recursion variables). The entire 
translation is syntax-driven, i.e., linear in the size of the linear process speci- 
fication and the property. We remark that mCRL2 also comes with tools that 
encode decision problems for behavioural equivalences as equation system solv- 
ing problems; moreover, mCRL2 offers similar translations operating on labelled 
transition systems instead of linear process specifications. 


5.1 Improved Static Analysis of Equation Systems 


The parameters occurring in an equation system are derived from the parameters 
present in process specifications and first-order variables present in p-calculus 
formulae. Such parameters typically determine the set of second-order variables 
on which another second-order variable in an equation system depends. Most 
equation system solving techniques rely on explicitly computing these depen- 
dencies. Obviously, such techniques fail when the set of dependencies is infinite. 
Consider, for instance the equation system depicted below: 


VX(i,k:N) = (@41V X(1,k41)) AYm:N. Y(2,k +m) 
uY (i, k:N) = (k < 10 Vi = 2) A^ (i 2V Y(1,1)) 


Observe that the solution to X (1,1), which is true, depends on the solution to 
X(1,2), but also on the solution to Y(2,1+ m) for all m, see Fig. 4. Conse- 
quently, techniques that rely on explicitly computing the dependencies will fail 
to compute the solution to X(1,1). 


Fig. 4. Dependencies of second-order recursion variables on other second-order recur- 
sion variables in an equation system. 


Not all parameters are ‘used’ equally in an equation system: some parameters 
may only influence the truth-value of a second-order variable, whereas others 
may also influence whether an equation depends on second-order variables. For 
instance, in our example, the parameter i of X determines when there is a 
dependency of X on X, and in the equation for Y, parameter i determines when 
there is a dependency of Y on Y. The value for parameter k, however, is only of 
interest in the equation for Y, where it immediately determines its solution when 
i A 2: it will be true when k < 10 and false otherwise. For i = 2, the value of k 
is immaterial. As suggested by the dependency graph in Fig. 4, for X(1,1), the 
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only dependency that is ultimately of consequence is the dependency on Y (1,1), 
i.e., k = 1; other values for k cannot be reached. 

The techniques implemented in the pbesstategraph tool, and which are 
described in [37], perform a liveness analysis for data variables, such as k in 
our example, and reset these values to default values when their actual value 
no longer matters. To this end, a static analysis determines a set of control 
flow parameters in an equation system. Intuitively, a control flow parameter 
is a parameter in an equation for which we can statically detect that it can 
assume only a finite number of distinct values, and that its values determine 
which occurrences of recursion variables in an equation are relevant. Such control 
flow parameters are subsequently used to approximate the dependencies of an 
equation system, and compute the set of data variables that are still live. As 
soon as a data variable switches from live to not live, it can be set to a default, 
pre-determined value. 

In our example, parameter 7 in equations X and Y is a control flow parameter 
that can take on value 1 or 2. Based on a liveness analysis one can conclude that 
the second argument in both occurrences of the recursion variable X in the 
equation for X can be reset, leading to an equation system that has the same 
solution as the original one: 


VX(i,k:N) = (i #1 V X(1,1)) AVm:N. Y(2, 1) 
uY (i, k:N) = (k <10Vi=2)A(iA2VY(1,1) 


Observe that there are only a finite number of dependencies in the above equation 
system, as the universally quantified variable m no longer induces an infinite 
set of dependencies. Consequently, it can be solved using techniques that rely 
on computing the dependencies in an equation system. The experiments in [37] 
show that pbesstategraph in general speeds up solving when it is able to reduce 
the underlying set of dependencies in an equation system, and when it is not 
able to do so, the overhead caused by the analysis is typically small. 


5.2 Infinite-State Model Checking 


Two new experimental tools, pbessymbolicbisim [40] and pbesabsinthe [16], 
support model checking of infinite-state systems. These are two of the few sym- 
bolic tools in the toolset. Regular PBES solving techniques, such as those imple- 
mented in pbessolve, store each state explicitly, which prohibits the analysis of 
infinite-state systems. In pbessymbolicbisin, (infinite) sets of states are repre- 
sented using first-order logic expressions. Instead of straightforward exploration, 
it performs symbolic partition refinement based on the information about the 
underlying state space that is contained in the PBES. The approximation of the 
state space is iteratively refined, until it equals the bisimulation quotient of that 
state space. Moreover, since the only goal of this tool is to solve a PBES, i.e. give 
the answer true or false, additional abstraction techniques can be very coarse. 
As a result, the tool often terminates before the bisimulation quotient has been 
fully computed. 
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The second tool, pbesabsinthe, requires the user to specify an abstraction 
mapping manually. If the abstraction mapping satisfies certain criteria, it will 
be used to generate a finite underlying graph structure. By solving the graph 
structure, the tool obtains a solution to the PBES under consideration. 

The theoretical foundations of pbessymbolicbisim and pbesabsinthe are 
similar: pbessymbolicbisim computes an abstraction based on an equiva- 
lence relation and pbesabsinthe works with preorder-based abstractions. Both 
approaches have their own strengths and weaknesses: pbesabsinthe requires 
the user to specify an abstraction manually, whereas pbessymbolicbisim runs 
fully automatically. However, the analysis of pbessymbolicbisim can be very 
costly for larger models. A prime application of pbessymbolicbisim and 
pbesabsinthe is the verification of real-time systems. 


5.3 Evidence Extraction 


One of the major new features of the mCRL2 toolset that, until recently, was 
lacking is the ability to generate informative counterexamples (resp. witnesses) 
from a failed (resp. successful) verification. The theory of evidence generation 
that is implemented is based on that of [15], which explains how to extract diag- 
nostic evidence for -calculus formulae via the Least Fixed-Point (LFP) logic. 
The diagnostic evidence that is extracted is a subgraph of the original labelled 
transition system that permits to reconstruct the same proof of a failing (or suc- 
cessful) verification. Note that since the input language for properties can encode 
branching-time and linear-time properties, diagnostic evidence cannot always be 
presented in terms of traces or lassos; for linear-time properties, however, the 
theory permits to generate trace- and lasso-shaped evidence. 

A straightforward implementation of the ideas of [15] in the setting of equa- 
tion systems is, however, hampered by the fact that the original evidence theory 
builds on a notion of proof graph that is different from the one developed in [14] 
for equation systems. In [49], we show that these differences can be overcome by 
modifying the translation of the model checking problem as an equation system 
solving problem. This new translation is invoked by passing the flag ‘-c’ to the 
tool lps2pbes. The new equation system solver pbessolve can be directed to 
extract and store the diagnostic evidence from an equation system by passing 
the linear process specification along with this equation system; the resulting 
evidence, which is stored as a linear process specification, can subsequently be 
simulated, minimised or visualised for further inspection. 

Figure5, taken from [49], gives an impression of the shape of diagnostic evi- 
dence that can be generated using the new tooling. The labelled transition sys- 
tem that is depicted presents the counterexample to a formula for the CERN job 
storage management system [43] that states that invariantly, each task that is 
terminated is inevitably removed. Note that this counterexample is obtained by 
minimising the original 142-state large evidence produced by our tools modulo 
branching bisimulation. 
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Fig. 5. Counterexamples for the requirement that each task in a terminating state is 
eventually removed for the Storage Management Systems. We omitted all edge labels, 
and the dashed line indicates a lengthy path through a number of other states (not 
depicted), whereas the dotted transitions are 3D artefacts. 


6 User-Friendly GUI 


The techniques explained in this paper may not be easily accessible to users that 
are new to the mCRL2 toolset. This is because the toolset is mostly intended 
for scientific purposes; at least initially, not much attention had been spent on 
user friendliness. As the toolset started to get used in workshops and academic 
courses however, the need for this user friendliness increased. This gave rise to 
the tools mcr12-gui, a graphical alternative to the command line usage of the 
toolset, and mcr12xi, an editor for mCRL2 specifications. However, to use the 
functionality of the toolset it was still required to know about the individual 
tools. For instance, to visualise the state space of an mCRL2 specification, one 
needed to manually run the tools mcr1221ps, 1lps21ts and 1ltsgraph. 

As an alternative, the tool mcr12ide has been added to the mCRL2 toolset. 
This tool provides a graphical user interface with a text editor to create and edit 
mCRL2 specifications and it provides the core functionality of the toolset such 
as visualising the (reduced) state space and verifying properties. The tools that 
correspond to this functionality are abstracted away from the user; only one or 
a few button clicks are needed. 

See Fig.6 for an instance of mcr12ide with an open project, consisting of 
an mCRL2 specification and a number of properties. The UI consists of an 
editor for mCRL2 specifications, a toolbar at the top, a dock listing defined 
properties on the right and a dock with console output at the bottom. The 
toolbar contains buttons for creating, opening and saving a project and buttons 
for running tools. The properties dock allows verifying each single property on 
the given mCRL2 specification, editing/removing properties and showing the 
witness/counterexample after verification. 


7 Applications 


The mCRL2 toolset and its capabilities have not gone unnoticed. Over the years 
numerous initiatives and collaborations have sprouted to apply its functionality. 


7.1 mCRL2 as a Verification Back-End 


The mCRL2 toolset enjoys a sustained application in industry, often in the con- 
text of case studies carried out by MSc or PhD students. Moreover, the mCRL2 
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37 ) 
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##### SOLVING PBES ##### 
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Number of vertices in the structure graph: 593 
Solving parity game... 
The property rl_inf often_enabled taken on this specification evaluates to false 


Fig. 6. An instance of mcr12ide in Windows 10 with an mCRL2 specification of the 
alternating bit protocol. The properties in the dock on the right are (from top to 
bottom) true, false and not checked yet. 


toolset is increasingly used as a back-end aiming at verification of higher-level 
languages. Some of these applications are built on academic languages; e.g., 
in [22] the Algebra for Wireless Networks is translated to mCRL2, enabling 
the verification of protocols for Mobile Ad hoc Networks and Wireless Mesh 
Networks. Models written in the state-machine based Simple Language of Com- 
municating Objects (SLCO) are translated to mCRL2 to verify shared-memory 
concurrent systems and reason about the sequential consistency of automatically 
generated multi-threaded software [42]. Others are targeting more broadly used 
languages; e.g., in [39], Go programs are translated to mCRL2 and the mCRL2 
toolset is used for model checking Go programs. 

The use of mCRL2 in industry is furthermore driven by the current Formal 
Model-Driven Engineering (FMDE) trend. In the FMDE paradigm, programs 
written in a Domain-Specific Language (DSL) are used to generate both exe- 
cutable code and verifiable models. A recent example is the commercial FMDE 
toolset Dezyne developed by Verum, see [9], which uses mCRL2 to check for 
livelocks and deadlocks, and which relies on mCRL2’s facilities to check for 
refinement relations (see Sect. 4.2) to check for interface compliance. Similar 
languages and methodologies are under development at other companies. For 
instance, ASML, one of the world’s leading manufacturers of chip-making equip- 
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ment, is developing the Alias language, and Océ, a global leading company in 
digital imaging, industrial printing and collaborative business services, is devel- 
oping the OIL language. Both FMDE solutions build on mCRL2. 

We believe the FMDE trend will continue in the coming years and that it 
will influence the development of the toolset. For example, the use of refinement 
checking in the Dezyne back-end has forced us to implement several optimisa- 
tions (cf. Sect. 4.2). Furthermore, machine-generated specifications are typically 
longer and more verbose than handwritten specifications. This will require a 
more efficient implementation of the lineariser — as implemented in mcr1221ps 
— in the coming years. 


7.2 Software Product Lines 


A software product line (SPL) is a collection of systems, individually called prod- 
ucts, sharing a common core. However, at specific points the products may show 
slightly different behaviour dependent on the presence or absence of so-called 
features. The overall system can be concisely represented as a featured transi- 
tion system (FTS), an LTS with both actions and boolean expressions over a set 
of features decorating the transitions (see [12]). If a product, given its features, 
fulfils the boolean expression guarding the transition the transition may be taken 
by the product. Basically, there are two ways to analyse SPLs: product-based 
and family-based. In product-based analysis each product is verified separately; 
in family-based model checking one seeks to verify a property for a group of 
products, referred to as a family, as a whole. 

Traditionally, dedicated model checkers are exploited for the verification of 
SPLs. Examples of such SPL model checkers are SNIP and ProVeLines by the 
team of [12] that are derived from SPIN. However, the mCRL2 toolset as-is, 
without specific modifications, has also been used to compare product-based 
vs. family-based model checking [3,5,7]. For this, the extension of the modal 
p-calculus for the analysis of FTSes proposed in [4], that combines actions 
and feature expressions for its modalities, was translated into the first-order 
-calculus [25], the property language of the mCRL2 toolset. As a result, verifi- 
cation of SPLs can be done using the standard workflow for mCRL2, achieving 
family-based model checking without a family-based model checker [18], with 
running times slightly worse than, but comparable to those of dedicated tools. 


8 Related Work 


Among the many model checkers available, the CADP toolset [21] is the clos- 
est related to mCRL2. In CADP, specifications are written in the Loros NT 
language, which has been derived from the E-LoTos ISO standard. Similar to 
mCRL2, CADP relies on action-based semantics, i.e., state spaces are stored as 
an LTS. Furthermore, the verification engine in CADP takes a p-calculus formula 
as input and encodes it in a BES or PBES. However, CADP has limited sup- 
port for u-calculus formulae with fixpoint alternation and, unlike mCRL2, does 
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not support arbitrary nesting of fixpoints. Whereas the probabilistic analysis 
tools for mCRL2 are still in their infancy, CADP offers more advanced analy- 
sis techniques for Markovian probabilistic systems. The user-license of CADP 
is restrictive: CADP is not open source and a free license is only available for 
academic use. 

Another toolset that is based on process algebra is PAT [45]. This toolset has 
native support for the verification of real-time specifications and implements on- 
the-fly reduction techniques, in particular partial-order reduction and symmetry 
reduction. PAT can perform model checking of LTL properties. 

The toolset LTSMIN [36] has a unique architecture in the sense that it is 
language-independent. One of the supported input languages is mCRL2. Thus, 
the state space of an mCRL2 specification can also be generated using LTSMIN’s 
high-performance multi-core and symbolic back-ends. 

Well-known tools that have less in common with mCRL2 are SPIN [34], 
NuSMV [11], PRISM [38] and UPPAAL [6]. Each of these tools has its own 
strengths. First of all, SPIN is an explicit-state model checker that incorporates 
advanced techniques to reduce the size of the state space (partial-order reduction 
and symmetry reduction) or the amount of memory required (bit hashing). SPIN 
supports the checking of assertions and LTL formulae. Secondly, NUSMV is a 
powerful symbolic model checker that offers model checking algorithms such 
as bounded model checking and counterexample guided abstraction refinement 
(CEGAR). The tools PRISM and UPPAAL focus on quantitative aspects of 
model checking. The main goal of PRISM is to analyse probabilistic systems, 
whereas UPPAAL focusses on systems that involve real-time behaviour. 


9 Conclusion 


In the past six years many additions and changes have been made to the mCRL2 
toolset and language to improve its expressivity, usability and performance. 
Firstly, the mCRL2 language has been extended to enable modelling of prob- 
abilistic behaviour. Secondly, by adding the ability to check refinement and to 
do infinite-state model checking the mCRL2 toolset has become applicable in 
a wider range of situations. Also, the introduction of the generation of coun- 
terexamples and witnesses for model checking problems and the introduction of 
an enhanced GUI has improved the experience of users of the mCRL2 toolset. 
Lastly, refinements to underlying algorithms, such as those for equivalence reduc- 
tions and static analyses of PBESs, have resulted in lowered running times when 
applying the corresponding tools. 

For the future, we aim to further strengthen several basic building blocks of 
the toolset, in particular the term library and the rewriter. The term library is 
responsible for storage and retrieval of terms that underlie mCRL2 data expres- 
sions. The rewriter manipulates data expressions based on rewrite rules speci- 
fied by the user. Currently, these two components have evolved over time but 
are rather limitedly documented. It has proven to be difficult to revitalise the 
current implementation or to make amendments to experiment with new ideas. 
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For this, one of the aims is to investigate the benefits of multi-core algorithms, 
expecting a subsequent speed-up for many other algorithms in the toolset. 
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Abstract. Many transaction systems distribute, partition, and repli- 
cate their data for scalability, availability, and fault tolerance. However, 
observing and maintaining strong consistency of distributed and partially 
replicated data leads to high transaction latencies. Since different appli- 
cations require different consistency guarantees, there is a plethora of 
consistency properties—from weak ones such as read atomicity through 
various forms of snapshot isolation to stronger serializability properties— 
and distributed transaction systems (DTSs) guaranteeing such proper- 
ties. This paper presents a general framework for formally specifying a 
DTS in Maude, and formalizes in Maude nine common consistency prop- 
erties for DTSs so defined. Furthermore, we provide a fully automated 
method for analyzing whether the DTS satisfies the desired property for 
all initial states up to given bounds on system parameters. This is based 
on automatically recording relevant history during a Maude run and 
defining the consistency properties on such histories. To the best of our 
knowledge, this is the first time that model checking of all these proper- 
ties in a unified, systematic manner is investigated. We have implemented 
a tool that automates our method, and use it to model check state-of- 
the-art DTSs such as P-Store, RAMP, Walter, Jessy, and ROLA. 


1 Introduction 


Applications handling large amounts of data need to partition their data for scal- 
ability and elasticity, and need to replicate their data across widely distributed 
sites for high availability and fault and disaster tolerance. However, guaran- 
teeing strong consistency properties for transactions over partially replicated 
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distributed data requires lot of costly coordination that results in long transac- 
tion delays. Different applications require different consistency guarantees, and 
balancing well the trade-off between performance and consistency guarantees is 
key to designing distributed transaction systems (DTSs). There is therefore a 
plethora of consistency properties for DTSs over partially replicated data—from 
weak properties such as read atomicity through various forms of snapshot isola- 
tion to strong serializability guarantees—and DTSs providing such guarantees. 

DTSs and their consistency guarantees are typically specified informally and 
validated only by testing; there is very little work on their automated formal 
analysis (see Section 8). We have previously formally modeled and analyzed sin- 
gle state-of-the-art industrial and academic DTSs, such as Google’s Megastore, 
Apache Cassandra, Walter, P-Store, Jessy, ROLA, and RAMP, in Maude [14]. 

In this paper we present a generic framework for formalizing both DTSs and 
their consistency properties in Maude. The modeling framework is very general 
and should allow us to naturally model most DTSs. We formalize nine popular 
consistency models in this framework and provide a fully automated method— 
and a tool which automates this method—for analyzing whether a DTS specified 
in our framework satisfies the desired consistency property for all initial states 
with the user-given number of transactions, data items, sites, and so on. 

In particular, we show how one can automatically add a monitoring mech- 
anism which records relevant history during a run of a DTS specified in our 
framework, and we define the consistency properties on such histories so that 
the DTS can be directly model checked in Maude. We have implemented a tool 
that uses Maude’s meta-programming features to automatically add the moni- 
toring mechanism, that automatically generates all the desired initial states, and 
that performs the Maude model checking. We have applied our tool to model 
check state-of-the-art DTSs such as variants of RAMP, P-Store, ROLA, Walter, 
and Jessy. To the best of our knowledge, this is the first time that model checking 
of all these properties in a unified, systematic manner is investigated. 

This paper is organized as follows. Section 2 provides background on rewrit- 
ing and Maude. Section 3 gives an overview of the consistency properties that 
we formalize. Section 4 presents our framework for modeling DTSs in Maude, 
and Section 5 explains how to record the history in such models. Section 6 for- 
mally defines consistency models as Maude functions on such recorded histories. 
Section 7 briefly introduces our tool which automates the entire process. Finally, 
Section 8 discusses related work and Section 9 gives some concluding remarks. 


2 Rewriting Logic and Maude 


Maude [14] is a rewriting-logic-based executable formal specification language 
and high-performance analysis tool for object-based distributed systems. 
A Maude module specifies a rewrite theory (X, E U A, R), where: 


— X is an algebraic signature; i.e., a set of sorts, subsorts, and function symbols. 
— (X, EUA) isa membership equational logic theory [14], with E a set of possibly 
conditional equations and membership axioms, and A a set of equational 
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axioms such as associativity, commutativity, and identity, so that equational 
deduction is performed modulo the axioms A. The theory (X, EU A) specifies 
the system’s states as members of an algebraic data type. 

— R is a collection of labeled conditional rewrite rules |I] : t — t if cond, 
specifying the system’s local transitions. 


Equations and rewrite rules are introduced with, respectively, keywords eq, 
or ceq for conditional equations, and rl and crl. The mathematical variables 
in such statements are declared with the keywords var and vars, or can have 
the form var:sort and be introduced on the fly. An equation f(t,,...,tn) = t 
with the owise (“otherwise”) attribute can be applied to a subterm f(...) only 
if no other equation with left-hand side f(u1,..., Un) can be applied. Maude 
also provides standard parameterized data types (sets, maps, etc.) that can 
be instantiated (and renamed); for example, pr SET{Nat} * (sort Set{Nat} to 
Nats) defines a sort Nats of sets of natural numbers. 


A class declaration class C | attı : s1, ..., att, : Sn declares a class 
C of objects with attributes att; to att, of sorts sı to Sn. An object instance of 
class C is represented as a term < O : C | attı : valı,..., attn : valn >, where 


O, of sort Oid, is the object’s identifier, and where val, to val, are the current 
values of the attributes att; to att,. A message is a term of sort Msg. A system 
state is modeled as a term of the sort Configuration, and has the structure of 
a multiset made up of objects and messages. 

The dynamic behavior of a system is axiomatized by specifying each of its 
transition patterns by a rewrite rule. For example, the rule (with label 1) 


rl [1] : m(0,w) 
<0: C| ai :x, a2: 0’, a: z> 
=> 
<0: Clļlal:x+w, a2: 0’, a:2z> 
m’ (0?’,x) . 


defines a family of transitions in which a message m(0, w) is read and consumed 
by an object 0 of class C, whose attribute a1 is changed to x + w, and a new 
message m’ (0? ,x) is generated. Attributes whose values do not change and do 
not affect the next state, such as a3 and a2, need not be mentioned in a rule. 
Maude also supports metaprogramming in the sense that a Maude specifi- 
cation M can be represented as a term M (of sort Module), so that a module 
transformation can be defined as a Maude function f : Module — Module. 


Reachability Analysis in Maude. Maude provides a number of analysis methods, 
including rewriting for simulation purposes, reachability analysis, and linear tem- 
poral logic (LTL) model checking. In this paper, we use reachability analysis. 
Given an initial state init, a state pattern pattern and an (optional) condition 
cond, Maude’s search command searches the reachable state space from init in 
a breadth-first manner for states that match pattern such that cond holds: 


search [bound] init =>! pattern such that cond . 
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where bound is an upper bound on the number of solutions to look for. The arrow 
=>! means that Maude only searches for final states (i.e., states that cannot be 
further rewritten) that match pattern and satisfies cond. If the arrow is instead 
=>* then Maude searches for all reachable states satisfying the search condition. 


3 


Transactional Consistency 


Different applications require different consistency guarantees. There are there- 
fore many consistency properties for DTSs on partially replicated distributed 
data stores. This paper focuses on the following nine, which span a spectrum 
from weak consistency such as read committed to strong consistency like serial- 
izability: 


1 


Read committed (RC) [6] disallows a transaction! from seeing any uncom- 
mitted or aborted data. 

Cursor stability (CS) [16], widely implemented by commercial SQL systems 
(e.g., IBM DB2 [1]) and academic prototypes (e.g., MDCC [21]), guarantees 
RC and in addition prevents the lost update anomaly. 

Read atomicity (RA) [5] guarantees that either all or none of a (distributed) 
transaction’s updates are visible to other transactions. For example, if Alice 
and Bob become friends on social media, then Charlie should not see that 
Alice is a friend of Bob’s, and that Bob is not a friend of Alice’s. 

Update atomicity (UA) [12,25] guarantees read atomicity and prevents the 
lost update anomaly. 

Snapshot isolation (ST) [6] requires a multi-partition transaction to read from 
a snapshot of a distributed data store that reflects a single commit order of 
transactions across sites, even if they are independent of each other: Alice 
sees Charlie’s post before seeing David’s post if and only if Bob sees the two 
posts in the same order. Charlie and David must therefore coordinate the 
order of committing their posts even if they do not know each other. 
Parallel snapshot isolation (PST) [36] weakens SI by allowing different com- 
mit orders at different sites, while guaranteeing that a transaction reads the 
most recent version committed at the transaction execution site, as of the 
time when the transaction begins. For example, Alice may see Charlie’s post 
before seeing David’s post, even though Bob sees David’s post before Char- 
lie’s post, as long as the two posts are independent of each other. Charlie and 
David can therefore commit their posts without waiting for each other. 
Non-monotonic snapshot isolation (NMSI) [4] weakens PSI by allowing a 
transaction to read a version committed after the transaction begins: Alice 
may see Bob’s post that committed after her transaction started executing. 
Serializability (SER) [33] ensures that the execution of concurrent transac- 
tions is equivalent to one where the transactions are run one at a time. 
Strict Serializability (SSER) strengthens SER by enforcing the serial order 
to follow real time. 


A transaction is a user application request, typically consisting of a sequence of read 


and/or write operations on data items, that is submitted to a (distributed) database. 
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4 Modeling Distributed Transaction Systems in Maude 


This section presents a framework for modeling in Maude DTSs that satisfy the 
following general assumptions: 


— We can identify and record “when”? a transaction starts executing at its 


server/proxy and “when” the transaction is committed and aborted at the 
different sites involved in its validation. 
— The transactions record their read and write sets. 


If a such a DTS is modeled in this framework, our tool can automatically model 
check whether it satisfies the above consistency properties, as long as it can detect 
the read and write sets and the above events: start of transaction execution, and 
abort/commit of a transaction at a certain site. This section explains how the 
system should be modeled so that our tool automatically discovers these events. 
We make the following additional assumptions about the DTSs we target: 


— The database is distributed across of a number of sites, or servers or replicas, 
that communicate by asynchronous message passing. Data are partially repli- 
cated across these sites: a data item may be replicated/stored at more than 
one site. The sites replicating a data item are called that item’s replicas. 

— Systems evolve by message passing or local computations. Servers communi- 
cate by asynchronous message passing with arbitrary but finite delays. 

— A client forwards a transaction to be executed to some server (called the 
transaction’s executing server or proxy), which executes the transaction. 

— Transaction execution should terminate in commit or abort. 


4.1 Modeling DTSs in Maude 


A DTS is modeled in an object-oriented style, where the state consists of a num- 
ber of replica objects, each modeling a local database/server /site, and a number 
of messages traveling between the replica objects. A transaction is modeled as 
an object which resides inside the replica object executing the transaction. 


Basic Data Types. There are user-defined sorts Key for data items (or keys) and 
Version for versions of data items, with a partial order < on versions, with v <w’ 
denoting that v’ is a later version of v in <. We then define key-version pairs 
<key, version> and sets of such pairs, that model a transaction’s read and write 
sets, as follows: 


sorts Key Version KeyVersion . 
op <_,_> : Key Version -> KeyVersion . 
pr SET{KeyVersion} * (sort Set{KeyVersion} to KeyVersions) 


? Since we do not necessarily deal with real-time systems, this “when” may not denote 
the real time, but when the event takes place relative to other events. 
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To track the status of a transaction (on non-proxies, or remote servers) we 
define a sort TxnStatus consisting of some transaction’s identifier and its status; 
this is used to indicate whether a remote transaction (one executed on another 
server) is committed on this server: 


op [_,_] : Oid Bool -> TxnStatus [ctor] 
pr SET{TxnStatus} * (sort Set{TxnStatus} to TxnStatusSet) 


Modeling Replicas. A replica (or site) stores parts of the database, executes the 
transactions for which it is the proxy, helps validating other transactions, and is 
formalized as an object instance of a subclass of the following class Replica: 


class Replica | executing: Configuration, committed : Configuration, 
aborted : Configuration, decided: TxnStatusSet . 


The attributes executing, committed, and aborted contain, respectively, trans- 
actions that are being executed, and have been committed or aborted on the exe- 
cuting server; decided is the status of transactions executed on other servers. 

To model a system-specific replica a user should specify it as an object 
instance of a subclass of the class Replica with new attributes. 


Example 1. A replica in our Maude model of Walter [26] is modeled as an object 
instance of the following subclass Walter-Replica of class Replica that adds 
14 new attributes (only 4 shown below): 


class Walter-Replica | store: Datastore, sqn: Nat, 
locked: Locks, votes: Vote, 
subclass Walter-Replica < Replica . 


Modeling Transactions. A transaction should be modeled as an object of a sub- 
class of the following class Txn: 


class Txn | readSet : KeyVersions, writeSet : KeyVersions . 


where readSet and writeSet denote the key/version pairs read and written by 
the transaction, respectively. 


Example 2. Walter transactions can be modeled as object instances of the sub- 
class Walter-Txn with four new attributes: 


class Walter-Txn | operations : OperationList, localVars : LocalVars, 
startVIS : VectorTimestamp, txnSQN : Nat . 
subclass Walter-Txn < Txn . 


Modeling System Dynamics. We describe how the rewrite rules defining the start 
of a transaction execution and aborts and commits at different sites should be 
defined so that our tool can detect these events. 
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The start of a transaction execution must be modeled by a rewrite rule where 
the transaction object appears in the proxy server’s executing attribute in 
the right-hand side, but not in the left-hand side, of the rewrite rule. 


Example 3. A Walter replica starts executing a transaction TID by moving 
TID in gotTxns (buffering transactions from clients) to executing:? 


rl [start-txn] 
< RID : Walter-Replica | executing : TRANSES, committedVTS : VTS, 
gotTxns : < TID : Tan | startVTS : empty > ;; TXNS > 


< RID : Walter-Replica | gotTxns : TXNS, 
executing : TRANSES < TID : Tan | startVTS : VIS > >. 


When a transaction is committed on the executing server, the transaction 
object must appear in the committed attribute in the right-hand side—but 
not in the left-hand side—of the rewrite rule. Furthermore, the readSet and 
writeSet attributes must be explicitly given in the transaction object. 


Example 4. In Walter, when all operations of an executing read-only trans- 
action have been performed, the proxy commits the transaction directly: 


rl [commit-read-only-txn] 
< RID : Walter-Replica | committed : TRANSES’, 
executing : TRANSES 
< TID: Tan | operations: nil, writeSet: empty, readSet: RS > > 
=> 
< RID : Walter-Replica | committed : (TRANSES’ < TID : Tan | >), 
executing : TRANSES > . 


When a transaction is aborted by the executing server, the transaction object 
must appear in the aborted attribute in the right-hand side, but not in the 
left-hand side, of a rewrite rule. Again, the transaction should present its 
attributes writeSet and readSet (to be able to record relevant history). See 
our longer report [27] for an example of such a rule. 

A rewrite rule that models when a transaction’s status is decided remotely 
(i.e., not on the executing server) must contain in the right-hand side (only) 
the transaction’s identifier and its status in the replica’s decided attribute. 


These requirements are not very strict. The Maude models of the DTSs RAMP 
[29], Faster [24], Walter [26], ROLA [25], Jessy [28], and P-Store [32] can all 
be seen as instantiations of our modeling framework, with very small syntactic 
changes, such as defining transaction and replica objects as subclasses of Txn 
and Replica, changing the names of the attributes and sorts, etc. The Apache 
Cassandra NoSQL key-value store can be seen as a transaction system where 
each transaction is a single operation; the Maude model of Cassandra in [30] can 
also be easily modified to fit within our modeling framework. 
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We do not give variable declarations, but follow the convention that variables are 


written in (all) capital letters. 
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5 Adding Execution Logs 


To formalize and analyze consistency properties of distributed transaction sys- 
tems we add an “execution log” that records the history of relevant events during 
a system execution. This section explains how this history recording can be added 
automatically to a model of a DTS that is specified as explained in Section 4. 


5.1 Execution Log 


To capture the total order of relevant events in a run, we use a “logical global 
clock” to order all key events (i.e., transaction starts, commits, and aborts). This 
clock is incremented by one each time such an event takes place. 

A transaction in a replicated DTS is typically committed both locally (at 
its executing server) and remotely at different times. To capture this, we define 
a “time vector” using Maude’s map data type that maps replica identifiers (of 
sort Oid) to (typically “logical” ) clock values (of sort Time, which here are the 
natural numbers: subsort Nat < Time): 


pr MAP{Oid,Time} * (sort Map{0id,Time} to VectorTime) 


where each element in the mapping has the form replica-id |-> time. 

An execution log (of sort Log) maps each transaction (identifier) to a record 
<prory, issueTime, finish Time, committed, reads, writes>, with proxy its proxy 
server, issueTime the starting time at its proxy server, finishTime the com- 
mit/abort times at each relevant server, committed a flag indicating whether the 
transaction is committed at its proxy, reads the key-version pairs read by the 
transaction, and writes the key-version pairs written: 


sort Record . 
op <_,_,_,_,_,_> : Oid Time VectorTime 

Bool KeyVersions KeyVersions -> Record. 
pr MAP{Oid,Record} * (sort Map{Oid,Record} to Log) 


5.2 Logging Execution History 


We show how the relevant history of an execution can be recorded during a run 
of our Maude model by transforming the original Maude model into one which 
also records this history. 

First, we add to the state a Monitor object that stores the current logical 
global time in the clock attribute and the current log in the log attribute: 


< M : Monitor | clock : Time, log : Log >. 


The log is updated each time an interesting event (see Section 4.1) happens. 
Our tool identifies those events and automatically transforms the corresponding 
rewrite rules by adding and updating the monitor object. 
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EXECUTING. A transaction starts executing when the transaction object appears 
in a Replica’s executing attribute in the right-hand side, but not in the left- 
hand side, of a rewrite rule. The monitor then adds a record for this transaction, 
with the proxy and start time, to the log, and increments the logical global clock. 


Example 5. The rewrite rule in Example 3 where a Walter replica is served a 
transaction is modified by adding and updating the monitor object (in blue): 


rl [start-txn] 
< O@M : Monitor | clock : GT@M, log : LOGOM > 
< RID : Walter-Replica | executing : TRANSES, committedVTS : VTS, 
gotTxns : < TID : Txn | startVTS : empty > ;; TXNS > 
=> 
< O@M : Monitor | clock : GT@M + 1, log : LOGO, 
(TID |-> < RID, GTOM, empty, false, empty, empty >) > 
< RID : Walter-Replica | gotTxns : TXNS, 
executing : TRANSES < TID : Txn | startVTS : VIS >>. 


where the monitor 00M adds a new record for the transaction TID in the log, with 
starting time (i.e., the current logical global time) GT@M at its executing server 
RID, finish time (empty), flag (false), read set (empty), and write set (empty). 
The monitor also increments the global clock by one. 


COMMIT. A transaction commits at its proxy when the transaction object 
appears in the proxy’s committed attribute in the right-hand side, but not in 
the left-hand side, of a rewrite rule. The record for that transaction is updated 
with commit status, versions read and written, and commit time, and the global 
logical clock is incremented. 


Example 6. The monitor object is added to the rewrite rule in Example 4 for 
committing a read-only transaction: 


rl [commit-read-only-txn] 
< O@M : Monitor | clock : GT@M, log : LOG@M , 
(TID |-> < RID, TOM, VTS@M, FLAG@M, READS@M, WRITES@M)) > 
< RID : Walter-Replica | committed : TRANSES’, 
executing : TRANSES 
< TID: Txn | operations: nil, writeSet : empty, readSet:RS > > 


< O@M : Monitor | clock : GT@M + 1, log : LOG®M , 
(TID |-> < RID, TOM, insert (RID, GTOM, VISO@M), true, RS, empty >) 
< RID : Walter-Replica | committed : (TRANSES’ < TID : Txn | >), 
executing : TRANSES > . 


The monitor updates the log for the transaction TID by setting its finish time 
at the executing server RID to GT@M (insert (RID,GT@M,VTS@M)), setting the 
committed flag to true, setting the read set to RS and write set to empty (this 
is a read-only transaction), and increments the global clock. 
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ABORT. Abort is treated as commit, but the commit flag remains false. 


DECIDED. When a transaction’s status is decided remotely, the record for that 
transaction’s decision time at the remote replica is updated with the current 
global time. See [27] for an example. 

We have formalized/implemented the transformation from a Maude specifi- 
cation of a DTS into one with a monitor as a meta-level function monitorRules 
: Module -> Module in Maude. See our longer report [27] for details. 


6 Formalizing Consistency Models in Maude 


This section formalizes the consistency properties in Section 3 as functions on 
the “history log” of a completed run. The entire Maude specification of these 
functions is available at https://github.com/siliunobi/cat. Due to space restric- 
tions, we only show the formalization of four of the consistency models, and refer 
to our report [27] for the formalization of the other properties. 


Read Committed (RC). (A transaction cannot read any writes by uncommit- 
ted transactions.) Note that standard definitions for single-version databases 
disallow reading versions that are not committed at the time of the read. We 
follow the definition for multi-versioned systems by Adya, summarized by Bailis 
et al. [5], that defines the RC property as follows: (i) a committed transaction 
cannot read a version that was written by an aborted transaction; and (ii) a 
transaction cannot read intermediate values: that is, if T writes two versions 
<X,V> and <X,V’ > with V < V’, then no T’ Æ T can read <X,V>. 

The first equation defining the function rc, specifying when RC holds, checks 
whether some (committed) transaction TID1 read version V of key X (i.e., <X,V> 
is in TID’s read set <X,V> , RS, where RS matches the rest of TID’s read set), and 
this version V was written by some transaction TID2 that was never committed 
(i.e., TID2’s commit flag is false, and its write set is <X,V > , WS’). The second 
equation checks whether there was an intermediate read of a version <X,V> that 
was overwritten by the same transaction TID2 that wrote the version:+ 


op rc : Log -> Bool . 


eq rc(TID1 |-> <0,T, VT, true, (<X,V>, RS), WS>, 

TID2 |-> <0’,T’, VT’, false, RS’, (<X,V>, WS’) >, LOG) = false. 
eq rc(TID1 |-> <0,T, VT, true, (<X,V>, RS), WS>, 

TID2 |-> <0’,T’, VT’, true, RS’, (<X,V>, <X,V’ >,WS’) >, 


LOG) = false if V < V’ 
eq rc(LOG) = true [owise] 


t The configuration union and the union operator ‘,’ for maps and sets are declared 
associative and commutative. The first equation therefore matches any log where 
some committed transaction read a key-version pair written by some aborted 
transaction. 
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Read Atomicity (RA). A system guarantees RA if it prevents fractured reads and 
prevents transactions from reading uncommitted or aborted data. A transaction 
Tj exhibits fractured reads if transaction T; writes versions £m and yn, Tj reads 
version zm and version yg, and k < n [5]. The function fracRead checks whether 
there are fractured reads in the log. There is a fractured read if a transaction 
TID2 reads X and Y, transaction TID1 writes X and Y, TID2 reads the version VX 
of X written by TID1, and reads a version VY’ of Y written before VY (VY’ < VY): 


op fracRead : Log -> Bool . 
ceq fracRead(TID1 |-> <0,T, VT, true, (<X,VX>,<Y,VY’>,RS),WS>, 
TID2 |-> <0’, T’, VT’, true, RS’, (<X,VX>,<Y,VY>, WS’) >, LOG) 
= true if VY’ < VY. 
eq fracRead(LOG) = false [owise] 


We define RA as the combination of RC and no fractured reads: 


op ra : Log -> Bool . 
eq ra(LOG) = rc(LOG) and not fracRead(LOG) 


Parallel snapshot isolation (PSI) is given by three properties [36]: 


— PSI-1 (site snapshot read): All operations read the most recent committed 
version at the transaction’s site as of time when the transaction began. 

— PSI-2 (no write-write conflicts): The write sets of each pair of committed 
somewhere-concurrent® transactions must be disjoint. 

— PSI-3 (commit causality across sites): If a transaction T} commits at a site 
S before a transaction T starts at site S, then Tı cannot commit after T> at 
any site. 


The function notSiteSnapshotRead checks whether the system log satisfies 
PSI-1 by returning true if there is a transaction that did not read the most 
recent committed version at its executing site when it began: 


op notSiteSnapshotRead : Log -> Bool . 

ceq notSiteSnapshotRead( 
TID1 |-> < RID1,T, VT1, true, (<X,V>,RS1),WS1 >, 
TID2 |-> < RID2,T’, (RID1 |-> T2, VT2), true, RS2, (<X,V>, WS2) >, 
TID3 |-> < RID3,T’’, (RID1 |-> T3, VT3), true, RS3, (<X,V’ >, WS3) >, 
LOG) = true if V =/= V? /\ T3 < T /\ T3 > T2. 

ceq notSiteSnapshotRead( 
TID1 |-> < RID1,T,VT1, true, (<X,V> ,RS1),WS1 >, 
TID2 |-> < RID2, T’, (RID1 |-> T2, VT2), true, RS2, (<X,V> ,WS2) >, 
LOG) = true if T < T2. 

eq notSiteSnapshotRead(LOG) = false [owise] 


5 Two transactions are somewhere-concurrent if they are concurrent at one of their 
sites. 
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In the first equation, the transaction TID1, hosted at site RID1, has in its read set 
a version <X,V> written by TID2. Some transaction TID3 wrote version <X,V’ > 
and was committed at RID1 after TID2 was committed at RID1 (T3 > T2) and 
before TID1 started executing (T3 < T). Hence, the version read by TID1 was 
stale. The second equation checks if TID1 read some version that was committed 
at RID1 after TID1 started (T < T2). 

The function someWhereConflict checks whether PSI-2 holds by looking 
for a write-write conflict between any pair of committed somewhere-concurrent 
transactions in the system log: 


op someWhereConflict : Log -> Bool . 

ceq someWhereConflict ( 
TID1 |-> < RID1,T, (RID1 |-> T1,VT1), true, RS, (<X,V>, WS) >, 
TID2 |-> < RID2, T’, (RID1 |-> T2, VT2), true, RS’, (<X,V’>,WS’) >, 
LOG) = true if T2 > T /\ T2 < T1. 

eq someWhereConflict(LOG) = false [owise] 


The above function checks whether the transactions with the write conflict are 
concurrent at the transaction TID1’s proxy RID1. Here, TID2 commits at RID1 at 
time T2, which is between TID1’s start time T and its commit time T1 at RID1. 

The function notCausality analyzes PSI-3 by checking whether there was 
a “bad situation” in which a transaction TID1 committed at site RID2 before a 
transaction TID2 started at site RID2 (T1 < T2), while TID1 committed at site 
RID after TID2 committed at site RID (T3 > T4): 


op notCausality : Log -> Bool . 

ceq notCausality( 
TID1 |-> < RID1,T, (RID2 |-> T1,RID |-> T3, VT2), true, RS, WS >, 
TID2 |-> < RID2, T2, (RID |-> T4, VT4), true, RS’, WS’ >, 
LOG) = true if T1 < T2 /\ T3 > T4. 

eq notCausality(LOG) = false [owise] 


PSI can then be defined by combining the above three properties: 


op psi : Log -> Bool . 
eq psi(LOG) = not notSiteSnapshotRead(LOG) and 
not someWhereConflict(LOG) and not notCausality (LOG) 


Non-monotonic snapshot isolation (NMSI) is the same as PSI except that a 
transaction may read a version committed even after the transaction begins [3]. 
NMST can therefore be defined as the conjunction of PSI-2 and PSI-3: 


op nmsi : Log -> Bool . 
eq nmsi(LOG) = not someWhereConflict(LOG) and not notCausality (LOG) 
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Serializability (SER) means that the concurrent execution of transactions is 
equivalent to executing them in some (non-overlapping in time) sequence [33]. 

A formal definition of SER is based on direct serialization graphs (DSGs): 
an execution is serializable if and only if the corresponding DSG is acyclic. Each 
node in a DSG corresponds to a committed transaction, and directed edges in a 
DSG correspond to the following types of direct dependencies [2]: 


— Read dependency: Transaction Tj directly read-depends on transaction T; if 
T; writes some version x; and T} reads that version xi. 

— Write dependency: Transaction T; directly write-depends on transaction T; if 
T; writes some version x; and T} writes x’s next version after x; in the version 
order. 

— Antidependency: Transaction T; directly antidepends on transaction T; if T; 
reads some version x, and Tj writes x’s next version after x x. 


There is a directed edge from a node T; to another node Tj if transaction Tj 
directly read-/write-/antidepends on transaction T;. 
The dependencies/edges can easily be extracted from the our log as follows: 


— If there is a key-version pair <X , V> both in T2’s read set and in T1’s write 
set, then T2 read-depends on T1. 

— If T1 writes <X,V1> and T2 writes <X,V2>, and V1 < V2, and there no 
version <X, V> with V1 < V < V2, then T2 write-depends on T1. 

— T2 antidepends on T1 if <X, V1> is in T1’s read set, <X, V2> is in T2’s write 
set with V1 < V2 and there is no version <X, V> such that V1 < V < V2. 


We have defined a data type Dsg for DSGs, a function dsg : Log -> Dsg that 
constructs the DSG from a log, and a function cycle : Dsg -> Bool that checks 
whether a DSG has cycles. We refer to [27] for their definition in Maude. 

SER then holds if there is no cycle in the constructed DSG: 


op ser : Log -> Bool . 
eq ser(LOG) = not cycle(dsg(LOG)) . 


7 Formal Analysis of Consistency Properties of DTSs 


We have implemented the Consistency Analysis Tool (CAT) that automates the 
method in this paper. CAT takes as input: 


— A Maude model of the DTS specified as explained in Section 4. 

— The number of each of the following parameters: read-only, write-only, and 
read-write transactions; operations for each type of transaction; keys; replicas 
per key; clients; and servers. The tool analyzes the desired property for all 
initial states with the number of each of these parameters. 

— The consistency property to be analyzed. 
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Given these inputs, CAT performs the following steps: 


1. adds the monitoring mechanism to the user-provided system model; 

2. generates all possible initial states with the user-provided number of the dif- 
ferent parameters; and 

3. executes the following command to search, from all generated initial states, 
for one reachable final state where the consistency property does not hold: 


search [1] init =>! C:Configuration 
< M:Oid : Monitor | log: LOG:Log clock: N:Nat > 
such that not consistency-property(LOG:Log) . 


where the underlined functions are parametric, and are instantiated by the 
user inputs; e.g., consistency-property is replaced by the corresponding func- 
tion rc, psi, nmsi, ..., or ser, depending on which property to analyze. 


CAT outputs either “No solution,” meaning that all runs from all the given 
initial states satisfy the desired consistency property, or a counterexample (in 
Maude at the moment) showing a behavior that violates the property. 


Table 1. Model checking results w.r.t. consistency properties. “v”, “x”, and “-” refer 
to satisfying and violating the property, and “not applicable,” respectively. 


Maude Model | LOC Consistency Property 
RC|RA CS|UA)/NMSI| PSI| SI) SER | SSER 
RAMP-F [29]| 330 Vv |v =x |x - - |x] x x 
Faster [24] 300 |v |x |x] x - - |x] x x 
ROLA [25] 410 |}V | Vv |v iv - - |x] x x 
Jessy [28] 490 Viviviv v x |x] x x 
Walter [26] 830 |v | Vv |v iv v v |x] x x 
P-Store [32] | 440 | Vv | ViVi Vv v V ILV Vv x 


We have applied our tool to 14 Maude models of state-of-the-art academic 
DTSs (different variants of RAMP and Walter, ROLA, Jessy, and P-Store) 
against all nine properties. Table1 only shows six case studies due to space 
limitations. All model checking results are as expected. It is worth remarking 
that our automatic analysis found all the violations of properties that the respec- 
tive systems should violate. There are also some cases where model checking is 
not applicable (“-” in Table 1): some system models do not include a mechanism 
for committing a transaction on remote servers (i.e., no commit time on any 
remote server is recorded by the monitor). Thus, model checking NMSI or PSI 
is not applicable. 

We have performed our analysis with different initial states, with up to 4 
transactions, 4 operations per transaction, 2 clients, 2 servers, 2 keys, and 2 
replicas per key. Each analysis command took about 15 minutes (worst case) to 
execute on a 2.9 GHz Intel 4-Core i7-3520M CPU with 3.6 GB memory. 
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8 Related Work 


Formalizing Consistency Properties in a Single Framework. Adya [2] uses 
dependencies between reads and writes to define different isolation models in 
database systems. Bailis et al. [5] adopts this model to define read atomicity. 
Burckhardt et al. [11] and Cerone et al. [12] propose axiomatic specifications of 
consistency models for transaction systems using visibility and arbitration rela- 
tionships. Shapiro et al. [35] propose a classification along three dimensions (total 
order, visibility, and transaction composition) for transactional consistency mod- 
els. Crooks et al. [15] formalizes transactional consistency properties in terms of 
observable states from a client’s perspective. On the non-transactional side, Bur- 
ckhardt [10] focuses on session and eventual consistency models. Viotti et al. [38] 
expands his work by covering more than 50 non-transactional consistency prop- 
erties. Szekeres et al. [37] propose a unified model based on result visibility to 
formalize both transactional and non-transactional consistency properties. 

All of these studies propose semantic models of consistency properties suit- 
able for theoretical analysis. In contrast, we aim at algorithmic methods for auto- 
matically verifying consistency properties based on executable specifications of 
both the systems and their consistency models. Furthermore, none of the studies 
covered all of the transactional consistency models considered in this paper. 


Model Checking Distributed Transaction Systems. There is very little work on 
model checking state-of-the-art DTSs, maybe because the complexity of these 
systems requires expressive formalisms. Engineers at Amazon Web Services suc- 
cessfully used TLA+ to model check key algorithms in Amazon’s Simple Storage 
Systems and DynamoDB database [31]; however, they do not state which consis- 
tency properties, if any, were model checked. The designers of the TAPIR trans- 
action protocol have specified and model checked correctness properties of their 
design using TLA+ [41]. The IronFleet framework [20] combines TLA+ analy- 
sis and Floyd-Hoare-style imperative verification to reason about protocol-level 
concurrency and implementation complexities, respectively. Their methodology 
requires “considerable assistance from the developer” to perform the proofs. 

Distributed model checkers [22,40] are used to model check implementations 
of distributed systems such as Cassandra, ZooKeeper, the BerkeleyDB database 
and a replication protocol implementation. 

Our previous work [8, 18, 19, 24-26, 28, 29, 32] specifies and model checks single 
DTSs and consistency properties in different ways, as opposed to in a single 
framework that, furthermore, automates the “monitoring” and analysis process. 


Other Formal Reasoning about Distributed Database Systems. Cerone et al. [13] 
develop a new characterization of SI and apply it to the static analysis of DTSs. 
Bernardi et al. [7] propose criteria for checking the robustness of transactional 
programs against consistency models. Bouajjani et al. [9] propose a formal def- 
inition of eventual consistency, and reduce the problem of checking eventual 
consistency to reachability and model checking problems. Gotsman et al. [17] 
propose a proof rule for reasoning about non-transactional consistency choices. 
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There is also work [23,34,39] that focuses on specifying, implementing and 
verifying distributed systems using the Coq proof assistant. Their executable Coq 
“implementations” can be seen as executable high-level formal specifications, but 
the theorem proving requires nontrivial user interaction. 


9 Concluding Remarks 


In this paper we have provided an object-based framework for formally model- 
ing distributed transaction systems (DTSs) in Maude, have explained how such 
models can be automatically instrumented to record relevant events during a 
run, and have formally defined a wide range of consistency properties on such 
histories of events. We have implemented a tool which automates the entire 
instrumentation and model checking process. Our framework is very general: 
we could easily adapt previous Maude models of state-of-the-art DTSs such as 
Apache Cassandra, P-Store, RAMP, Walter, Jessy, and ROLA to our framework. 

We then model checked the DTSs w.r.t. all the consistency properties for all 
initial states with 4 transactions, 2 sites, and so on. This analysis was sufficient 
to differentiate the DTSs according to which consistency properties they satisfy. 

In future work we should formally relate our definitions of the consistency 
properties to other (non-executable) formalizations of consistency properties. We 
should also extend our work to formalizing and model checking non-transactional 
consistency properties for key-value stores such as Cassandra. 
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Abstract. Saturation is an efficient exploration order for computing the 
set of reachable states symbolically. Attempts to parallelize saturation 
have so far resulted in limited speedup. We demonstrate for the first time 
that on-the-fly symbolic saturation can be successfully parallelized at a 
large scale. To this end, we implemented saturation in Sylvan’s multi- 
core decision diagrams used by the LTSmin model checker. 

We report extensive experiments, measuring the speedup of paral- 
lel symbolic saturation on a 48-core machine, and compare it with the 
speedup of parallel symbolic BFS and chaining. We find that the parallel 
scalability varies from quite modest to excellent. We also compared the 
speedup of on-the-fly saturation and saturation for pre-learned transition 
relations. Finally, we compared our implementation of saturation with 
the existing sequential implementation based on Meddly. 

The empirical evaluation uses Petri nets from the model checking 
contest, but thanks to the architecture of LTSmin, parallel on-the-fly 
saturation is now available to multiple specification languages. Data or 
code related to this paper is available at: [34]. 


1 Introduction 


Model checking is an exhaustive algorithm to verify that a finite model of a 
concurrent system satisfies certain temporal properties. The main challenge is 
to handle the large state space, resulting from the combination of parallel com- 
ponents. Symbolic model checking exploits regularities in the set of reachable 
states, by storing this set concisely in a decision diagram. In asynchronous sys- 
tems, transitions have locality, i.e. they affect only a small part of the state 
vector. This locality is exploited in the saturation strategy, which is probably 
the most efficient strategy to compute the set of reachable states. 
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In this paper, we investigate the efficiency and speedup of a new parallel 
implementation of saturation, aiming at a multi-core, shared-memory imple- 
mentation. The implementation is carried out in the parallel decision diagram 
framework Sylvan [16], in the language-independent model checker LTSmin [22]. 
We empirically evaluate the speedup of parallel saturation on Petri nets from 
the Model Checking Contest [24], running the algorithm on up to 48 cores. 


1.1 Related Work 


The saturation strategy has been developed and improved by Ciardo et al. We 
refer to [13] for an extensive description of the algorithm. Saturation derives 
its efficiency from firing all local transitions that apply at a certain level of the 
decision diagram, before proceeding to the next higher level. An important step 
in the development of the saturation algorithm allows on-the-fly generation of 
the transition relations, without knowing the cardinality of the state variable 
domains in advance [12]. This is essential to implement saturation in LTSMIN, 
which is based on the PINS interface to discover transitions on-the-fly. 

Since saturation obtains its efficiency from a restrictive firing order, it seems 
inherently sequential. Yet the problem of parallelising saturation has been stud- 
ied intensively. The first attempt, Saturation NOW [9], used a network of 
PCs. This version could exploit the collective memory of all PCs, but due to 
the sequential procedure, no speedup was achieved. By firing local transitions 
speculatively (but with care to avoid memory waste), some speedup has been 
achieved [10]. More relevant to our work is the parallelisation of saturation for 
a shared memory architecture [20]. The authors used CILK to schedule par- 
allel work originating from firing multiple transitions at the same level. They 
reported some speedup on a dual-core machine, at the expense of a serious 
memory increase. Their method also required to precompute the transition rela- 
tion. An improvement of the parallel synchronisation mechanism was provided 
in [31]. They reported a parallel speedup of 2x on 4 CPUs. Moreover, their 
implementation supports learning the transition relation on-the-fly. Still, the 
successful parallelisation of saturation remained widely open, as indicated by 
Ciardo [14]: “Parallel symbolic state-space exploration is difficult, but what is 
the alternative?” 

For an extensive overview of parallel decision diagrams on various hardware 
architectures, see [15]. Here we mention some other approaches to parallel sym- 
bolic model checking, different from saturation for reachability analysis. First, 
Grumberg and her team [21] designed a parallel BDD package based on ver- 
tical partitioning. Each worker maintains its own sub-BDD. Workers exchange 
BDD nodes over the network. They reported some speedup on 32 PCs for BDD 
based model checking under the BFS strategy. The Sylvan [16] multi-core deci- 
sion diagram package supports symbolic on-the-fly reachability analysis, as well 
as bisimulation minimisation [17]. Oortwijn [28] experimented with a heteroge- 
neous distributed/multi-core architecture, by porting Sylvan’s architecture to 
RDMA over MPI, running symbolic reachability on 480 cores spread over 32 
PCs and reporting speedups of BFS symbolic reachability up to 50. Finally, 
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we mention some applications of saturation beyond reachability, such as model 
checking CTL [32] and detecting strongly connected components to detect fair 
cycles [33]. 


1.2 Contribution 


Here we show that implementing saturation on top of the multi-core decision 
diagram framework Sylvan [16] yields a considerable speedup in a shared-memory 
setting of up to 32.5x on 48 cores with pre-learned transition relations, and 52.2 
with on-the-fly transition learning. 

By design decision, our implementation reuses several features provided by 
Sylvan, such as: its own fine-grained, work-stealing framework Lace [18], its 
implementation of both BDDs (Binary Decision Diagrams) and LDDs (a List- 
implementation of Multiway Decision Diagrams), its concurrent unique table and 
operations cache, and finally, its parallel operations like set union and relational 
product. As a consequence, the pseudocode of the algorithm and additional 
code for saturation is quite small, and orthogonal to other BDD features. To 
improve orthogonality with the existing decision diagrams, we deviated from 
the standard presentation of saturation [13]: we never update BDD nodes in 
situ, and we eliminated the mutual recursion between saturation and the BDD 
operations for relational product to fire transitions. 

The implementation is available in the open-source high-performance model 
checking tool LTSMIN [22], with its language-agnostic interface, Partitioned 
Next-State Interface (PINS) [5,22,25]. Here, a specification basically provides a 
next-state function equipped with dependency information, from which LTSMIN 
can derive locality information. We fully support the flexible method of learning 
the transition relation on-the-fly during saturation [12]. As a consequence, our 
contribution extends the tool LTSmin with saturation for various specification 
languages, like Promela, DVE, Petri nets, mCRL2, and languages supported by 
the ProB model checker. See Sect. 4 on how to use saturation in LTSmin. 

The experiments with saturation in Sylvan are carried out in LTSmin as 
well. We used Petri nets from the MCC competition. Our experimental design 
has been carefully set up in order to facilitate fair comparisons. Besides learning 
the transition relation on-the-fly, we also pre-learned them in order to measure 
the overhead of learning, and eliminating its effect in comparisons. It is well 
known that the variable ordering has a large effect on the BDD sizes [29]. Hence, 
our experiments are based on two of the best static variable orderings known, 
Sloan [26] and Force [1]. In particular, our experiments measure and compare: 


— The performance of our parallel algorithm with one worker, compared to a 
state-of-the art sequential implementation of saturation in Meddly [4]. 

— The parallel speedup of our algorithm on 16 cores, and for specific examples 
up to 48 cores. 

— The efficiency and speedup of saturation compared to the BFS and chaining 
strategies for reachability analysis. 

— The effect of choosing Binary Decision Diagrams or List Decision Diagrams. 

— The effect of choosing Sloan or Force to compute static variable orders. 
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2 Preliminaries 


This paper proposes an algorithm for decision diagrams to perform the fixed 
point application of multiple transition relations according to the satura- 
tion strategy, combined with on-the-fly transition learning as implemented in 
LTSmIn. We briefly review these concepts in the following. 


2.1 Partitioned Transition Systems 


A transition system (TS) is a tuple (S,—,s°), where S' is a set of states, +C 
S x S is a transition relation and s° € S is the initial state. We define —* 
to be the reflexive and transitive closure of —. The set of reachable states is 
R = {s € S| s? —* s}. The goal of this work is to compute R via a novel 
multi-core saturation strategy. 

In this paper, we evaluate multi-core saturation using Petri nets. Figure 1 
shows an example of a (safe) Petri net. We show its initial marking, which is 
the initial state. A Petri net transition can fire if there is a token in each of its 
source places. On firing, these tokens are consumed and tokens in each target 
place are generated. For example, tı will produce one token in both pz and ps, 
if there is a token in p4. Transition tg requires a token in both p and pı to 
fire. The markings of this Petri net form the states of the corresponding TS, so 
here |S| = 2° = 32. From the initial marking shown, four more markings are 
reachable, connected by 10 enabled transition firings. This means |R| = 5, and 
|| = 10. 

Notice that transitions in Petri nets are quite local; transitions consume 
from, and produce into relatively few places. The firing of a Petri net transition 
is called an event and the number of involved places is known as the degree of 
event locality. This notion is easily defined for other asynchronous specification 
languages and can be computed by a simple control flow graph analysis. 


p2 Ps 
eee hh a 
A 
t2 t3 pa (+) t4 t5 
A 
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p3 pı 


Fig. 1. Example Petri net 
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To exploit event locality, saturation requires a disjunctive partitioning of the 
transition relation —, giving rise to a Partitioned Transition System (PTS). In 
a PTS, states are vectors of length N, and — is partitioned as a union of M 
transition groups. A natural way to partition a Petri net is by viewing each 
transition as a transition group. For Fig.1 this means we have N = 5 and 
M = 6. After disjunctive partitioning, each transition group depends on very 
few entries of the state vector. This allows for efficiently computing the reachable 
state space for the large class of asynchronous specification languages. LTSMIN 
supports commonly used specification languages, like DVE, mCRL2, Promela, 
PNML for Petri nets, and languages supported by ProB. 
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(a) LDD as array (b) Same LDD, internal linked-list representation 


Fig. 2. LDD for {(0, 0),(0, 2),(0, 4),(1, 0),(1, 2), (1, 4), (3, 2), (3, 4), (5, 0),(5, 1),(6, 1)}. 


2.2 Decision Diagrams 


Binary decision diagrams (BDDs) are a concise and canonical representation of 
Boolean functions BY — B [7]. A BDD is a rooted directed acyclic graph with 
leaves 0 and 1. Each internal node v has a variable label x;, denoted by var(v), 
and two outgoing edges labeled 0 and 1, denoted by low(v) and high(v). The 
efficiency of reduced, ordered BDDs is achieved by minimizing the structure with 
some invariants: The BDD may neither contain equivalent nodes, with the same 
var(v), low(v) and high(v), nor redundant nodes, with low(v) = high(v). Also, 
the variables must occur according to a fixed ordering along each path. 

Multi-valued or multiway decision diagrams (MDDs) generalize BDDs to 
finite domains (NY — B). Each internal MDD node with variable x; now has 
n; outgoing edges, labeled 0 to n; — 1. We use quasi-reduced MDDs with sparse 
nodes. In the sparse representation, values with edges to leaf 0 are skipped 
from MDD nodes, so outgoing edges must be explicitly labeled with remaining 
domain values. Contrary to BDDs, MDDs are usually “quasi-reduced”, meaning 
that variables are never skipped. In that case, the variable x; can be derived 
from the depth of the MDD, so it is not stored. 

A variation of MDDs are list decision diagrams (LDDs) [5,16], where sparse 
MDD nodes are represented as a linked list. See Fig. 2 for two visual represen- 
tations of the same LDD. Each LDD node contains a value, a “down” edge for 
the corresponding child, and a “right” edge pointing to the next element in the 
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list. Each list ends with the leaf 0 and each path from the root downwards ends 
with the leaf 1. The values in an LDD are strictly ordered, i.e., the values must 
increase to the “right”. 

LDD nodes have the advantage that common suffixes can be shared: The 
MDD for Fig. 2a requires two more nodes, one for [2,4] and one for [1], because 
edges can only point to an entire MDD node. LDDs suffer from an increased 
memory footprint and inferior memory locality, but their memory management 
is simpler, since each LDD node has a fixed small size. 


Pı p2 P3 Pa Ps p2 p3 Pa Psd Pi 
ti]O0 1 0 1 1 t| 1 1 0 0 0 
t| 0 1 1 0 0 t3|1 1 0 0 0 
t3| 0 1 1 0 0 ti} 1 0 1 1 0 
t4|1 0 0 O 1 te] 0 1 1 0 1 
ts|1 0 0 0 1 t4] 0 0 O 1 1 
te|1 0 1 1 0 ts|0 0 0 1 1 

(a) Natural order (b) Optimized order 


Fig. 3. Dependency matrices of Fig. 1. 


2.3 Variable Orders and Event Locality 


Good variable orders are crucial for efficient operations on decision diagrams. 
The syntactic variable order from the specification is often inadequate for the 
saturation algorithm to perform well. Hence, finding a good variable order is 
necessary. Variable reordering algorithms use heuristics based on event locality. 
The locality of events can be illustrated with dependency matrices. The size of 
those matrices is M x N, where M is the number of transition groups, and N 
is the length of the state vector. The order of columns in dependency matrices 
determines the order of variables in the DD. Figure 3a shows the natural order 
on places in Fig. 1. A measure of event locality is called event span [29]. Lower 
event span is correlated to a lower number of nodes in decision diagrams. This 
can be seen in LDDs in Figs. 4a and b that are ordered according to columns in 
Figs. 3a and b respectively. 

Event span is defined as the sum over all rows of the distance from the 
leftmost non-zero column to the rightmost non-zero column. The event span of 
Fig. 3a is 22 (= 44+24+2+5+5+4); the event span of Fig. 3b is 16, which is better. 
Optimizing the event span and thus variable order of DDs is NP-complete [6], yet 
there are heuristic approaches that run in subquadratic time and provide good 
enough orders. Commonly used algorithms are Noack [27], Force [1] and Sloan 
[30]. Noack creates a permutation of variables by iteratively minimizing some 
objective function. The Force algorithm acts as if there are springs in between 
nonzeros in the dependency matrix, and tries to minimize the average tension 
among them. Sloan tries to minimize the profile of matrices. In short, profile is 
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Fig. 4. Reachable states as LDDs with different orders on places 


the symmetric counterpart to event span. For a more detailed overview of these 
algorithms see [3]. In our empirical evaluation we use both Sloan and Force, 
because these have been shown to give the best results [2,26]. 


2.4 The Saturation Strategy 


The saturation strategy for reachability analysis, i.e., the transitive closure of 
transition relations applied to some set of states, was first proposed by Cia- 
rdo et al. See for an overview [11,13]. Saturation was combined with on-the-fly 
transition learning in [12]. Besides reachability, saturation has also been applied 
to CTL model checking [32] and in checking fairness constraints with strongly 
connected components [33]. 

Saturation is well-studied. The core idea is to always fire enabled transitions 
at the lower levels in the decision diagram, before proceeding to the next level. 
This tends to keep the intermediate BDD sizes much smaller than for instance the 
breadth-first exploration strategy. This is in particular the case for asynchronous 
systems, where transitions exhibit locality. There is also a major influence from 
the variable reordering: if the variables involved in a transition are grouped 
together, then this transition only affects adjacent levels in the decision diagram. 

We refer to [13] for a precise description of saturation. Our implementation 
deviates from the standard presentation in three ways. First, we implemented 
saturation for LDDs and BDDs, instead of MDDs. Next, we never update nodes 
in the LDD forest in situ; instead, we always create new nodes. Finally, the 
standard representation has a mutual recursion between saturation and firing 
transitions. Instead, we fire transition using the existing function for relational 
product, which is called from our saturation algorithm. As a consequence, the 


Multi-core On-The-Fly Saturation 65 


extension with saturation becomes more orthogonal to the specific decision dia- 
gram implementation. We refer to Sect. 3 for a detailed description of our algo- 
rithm. We show in Sect.5 that these design decisions do not introduce compu- 
tational overhead. 


3 Multi-core Saturation Algorithm 


To access the three elements of an LDD node zx, Sylvan [16] provides the functions 
value(x), down(x), right(x). To create or retrieve a unique LDD node using the 
hash table, Sylvan provides LookupLDDNode(value, down, right). 

Furthermore, Sylvan provides several operations on LDDs that we use to 
implement reachability algorithms, such as union(A, B) to compute the set union 
AUB and minus(A, B) to compute the set difference A \ B. For transition rela- 
tions, Sylvan provides an operation relprod(S, R) to compute the successors of 
S with transition relation R, and an operation relprodunion(S, R) that com- 
putes union(S, relprod(S, R)), i.e., computing the successors and adding them 
to the given set of states, in one operation. All these operations are internally 
parallelized, as described in [16]. 

We implement multi-core saturation as in Algorithm 1. We have a transition 
relation disjunctively partitioned into M relations Ro... Rm-1. These relations 
are sorted by the level (depth) of the decision diagram where they are applied, 
which is the first level touched by the relation. We say that relation R; is applied 


global: M transition relations Ro... Rm-1ı starting at depths do.. .dm-—1 
1 def saturate(S, k, d): 


2 if S=O0VS=1: return S 

3 if k= M : return S 

4 if result — cache[(S,k,d)] : return result 

5 if d= dp : 

6 k' — next relation k < k’ < M where dy #d, or M 
7 while S changes : 

8 S — saturate(S, k’, d) 

9 for i € [k, k’): S — relprodunion(S, R;) 
10 result — S 
11 else: 
12 do in parallel: 
13 right — saturate (right (S), k, d) 
14 down + saturate(down(S), k, d+ 1) 
15 result +— LookupLDDNode(value(S), down, right) 
16 cache[(S, k, d)] — result 
17 return result 


Algorithm 1: The multi-core saturation algorithm, which, given a set of states 
S and next transition relation k and current decision diagram depth d, exhaus- 
tively applies all transition relations Rx ... Rm—1ı using the saturation strategy. 
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at depth d;. We identify the current next relation with a number k, 0< k < M, 
where k = M denotes “no next relation”. Decision diagram levels are sequentially 
numbered with 0 for the root level. 

The saturate algorithm is given the initial set of states S and the initial 
next transition relation k = 0 and the initial decision diagram level d = 0. The 
algorithm is a straightforward implementation of saturation. First we check the 
easy cases where we reach either the end of an LDD list, where S = 0, or the 
bottom of the decision diagram, where S = 1. If there are no more transition 
relations to apply, then k = M and we can simply return S. When we arrive at 
line 4, the operation is not trivial and we consult the operation cache. 

If the result of this operation was not already in the cache, then we check 
whether we have relations at the current level. Since the relations are sorted by 
the level where they must be applied, we compare the current level d with the 
level dy, of the next relation k. If we have relations at the current level, then we 
perform the fixed point computation where we first saturate S for the remaining 
relations, starting at relation k’, which is the first relation that must be applied 
on a deeper level than d, and then apply the relations of the current level, that 
is, all R; where k < i < k’. If no relations match the current level, then we 
compute in parallel the results of the suboperations for the LDD of successor 
“right” and for the LDD of successor “down”. After obtaining these sub results, 
we use LookupLDDNode to compute the final result for this LDD node. Finally, 
we store this result in the operation cache and return it. 

The do in parallel keyword is implemented with the work-stealing frame- 
work Lace [18], which is embedded in Sylvan [16] and offers the primitives spawn 
and sync to create subtasks and wait for their completion. The implementation 
using spawn and sync of lines 12-14 is as follows. 

12 spawn(saturate(right(S), k, d)) 
13 down + saturate(down(S), k, d+ 1) 
14 right — sync() 


The implementation of multi-core saturation for BDDs is identical, except 
that we parallelize on the “then” and “else” successors of a BDD node, instead 
of on the “down” and “right” successors of an LDD node. 

To add on-the-fly transition relation learning to this algorithm, we simply 
modify the loop at line 9 as follows: 

9 for i€ [k,k’): 
10 learn-transitions(S, i, d) 
11 S <— relprodunion(S, Ri) 


The learn-transitions function provided by LTSMIN updates relation i 
given a set of states S. The function first restricts S to so-called short states $f, 
which is the projection of S on the state variables that are touched by relation 2. 
Then it calls the next-state function of the PINS interface for each new short 
state and it updates R; with the new transitions. 

Updating transition relations from multiple threads is not completely trivial. 
LTSMIN solves this using lock-free programming with the compare-and-swap 
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operation. After collecting all new transitions, LTSMIN computes the union with 
the known transitions and uses compare-and-swap to update the global relation; 
if this fails, the union is repeated with the new known transitions. 


4 Contributed Tools 


We present several new tools and extensions to existing tools produced in this 
work. The new tools support experiments and comparisons between various DD 
formats. The extension to Sylvan and LTSMIN provides end-users with multi- 
core saturation for reachability analysis. 


4.1 Tools for Experimental Purposes 


For the empirical evaluation, we need to isolate the reachability analysis of a 
given LDD (or BDD or MDD). To that end, we implemented three small tools 
that only compute the set of reachable states, namely lddmc for LDDs, bddmc 
for BDDs and medmc for MDDs using the library Meddly. These tools are given 
an input file representing the model, compute the set of reachable states, and 
report the number of states and the required time to compute all reachable 
states. Additionally we provide the tools ldd2bdd and ldd2meddly that convert 
an LDD file to a BDD file and to an MDD file. The LDD input files are generated 
using LTSMIN (see below). These tools can all be found online’. 


4.2 Tools for On-The-Fly Multi-core Saturation 


On-the-fly multi-core saturation is implemented in the LTSMIN toolset, which can 
be found online”. The examples in this section are also online®. On-the-fly multi- 
core saturation for Petri nets is available in LTSMIN’s tool pnml2lts-sym. This 
tool computes all reachable markings with parallel saturation. The command line 
to run it on Fig. 1 is pnml2lts-sym pnml/example.pnml --saturation=sat. The 
tool reports: pnml2lts-sym: state space has 5 states, 16 nodes. Additionally, it 
appears the final LDD has 16 nodes. 

Here the syntactic variable order of the places in pnml/example.pnml 
is used. To use a better variable order, the option -r is added to the 
command line. For instance adding -rf runs Force, while -rbs runs Sloan’s 
algorithm (as implemented in the well-known Boost library). Running 
pnml2lts-sym pnml/example.pnml --saturation=sat -rf reports that the final 
LDD has only 12 nodes. 

The naming convention of LTSMIN’s binaries follows the Partitioned Next- 
State Interface (PINS) architecture [5,22,25]. PINS forms a bridge between 
several language front-ends and algorithmic back-ends. Consequently, besides 


1 https: //github.com/trolando/sylvan. 
? https: //github.com/utwente-fmt /Itsmin. 
3 https: //github.com/trolando/ParallelSaturationExperiments. 
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pnml2lts-sym, LTSMIN also provides {pnml,dve,prom}2lts-{dist,mc,sym} and 
several other combinations. These binaries generate the state space for the lan- 
guages PNML, DVE and Promela, by means of distributed explicit-state, multi- 
core explicit-state and multi-core symbolic algorithms, respectively. Additionally, 
LTSMIN supports checking for deadlocks and invariants, and verifying LTL prop- 
erties and p-calculus formulas. In this work we focus on state space generation 
with the symbolic back-end only. 

We now demonstrate multi-core saturation for Promela models. Consider the 
file Promela/garp_1b2a.prm which is an implementation of the GARP proto- 
col [23]. To compute the reachable state space with the proposed algorithm and 
Force order, run: prom2lts-sym --saturation=sat Promela/garp_1b2a.prm -rf. 
On a consumer laptop with 8 hardware threads, LTSMIN reports 385,000,995 ,634 
reachable states within 1min. To run the example with a single worker, run 
prom2lts-sym —saturation=sat Promela/garp_1b2a.prm -rf --lace-workers=1. 
On the same laptop, the algorithm runs in 4min with 1 worker. We thus have a 
speedup of 4x with 8 workers for symbolic saturation on a Promela model. 


5 Empirical Evaluation 


Our goal with the empirical study is five-fold. First, we compare our parallel 
implementation with only 1 core to the purely sequential implementation of the 
MDD library Meddly [4], in order to determine whether our implementation is 
competitive with the state-of-the-art. Second, we study parallel scalability up to 
16 cores for all models and up to 48 cores with a small selection of models. 
Third, we compare parallel saturation with LDDs to parallel saturation with 
ordinary BDDs, to see if we get similar results with BDDs. Fourth, we compare 
parallel saturation without on-the-fly transition learning to on-the-fly parallel 
saturation, to see the effects of on-the-fly transition learning on the performance 
of the algorithm. Fifth, we compare parallel saturation with other reachability 
strategies, namely chaining and BFS, to confirm whether saturation is indeed a 
better strategy than chaining and BFS. 

To perform this evaluation, we use the P/T Petri net benchmarks obtained 
from the Model Checking Contest 2016 [24]. These are 491 models in total, stored 
in PNML files. We use parallel on-the-fly saturation (in LTSMIN) with a generous 
timeout of 1 hour to obtain LDD files of the models, using the Force variable 
ordering and using the Sloan variable ordering. In total, 413 of potentially 982 
LDD files were generated. These LDD files simply store the list decision diagrams 
of the initial states and of all transition relations. We convert the LDD files to 
BDD files (binary decision diagrams) with an optimal number of binary variables. 
We also convert the LDD files to MDD files for the experiments using Meddly. 
This ensures that all solvers have the same input model with the same variable 
order. 
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Table 1. The six solving methods that we use in the empirical evaluation. Five methods 
are parallelized and one method is on-the-fly. 


Method Tool Description Input Parallel OTF 
otf-ldd-sat pnm121ts-sym saturation PNML v v 
ldd-sat lddmc saturation LDD v 
ldd-chaining 1ddmc chaining LDD v 

ldd-bfs lddmc BFS LDD v 

bdd-sat bddmc saturation BDD v 

mdd-sat medmc saturation in Meddly MDD 


Table 2. Number of benchmarks (out of 413) solved within 20 min with each method 
with the given number of workers. 


Method Number of solved models with # workers 
1 2 4 8 16 Any 
otf-ldd-sat |387 397 399 404 407 408 
ldd-sat 388 393 399 402 402 404 
ldd-chaining | 351 354 360 367 371 371 
ldd-bfs 325 331 347 360 362 362 
bdd-sat 395 396 401 402 403 405 
mdd-sat 375 375 


See Table1 for the list of solving methods. As described in Sect.4, we 
implement the tools lddmc, bddmc and medmc to isolate reachability com- 
putation for the purposes of this comparison, using respectively the LDDs 
and BDDs of Sylvan and the MDDs of Meddly. The on-the-fly parallel sat- 
uration using LDDs is performed with the pnml2lts-sym tool of LTSMIN. 
We use the command line pnm12lts-sym ORDER --lace-workers=WORKERS 
--saturation=sat FILE, where ORDER is -rf for Force and -rbs for Sloan and 
WORKERS is a number from the set {1,2, 4,8,16}. 

All experimental scripts, input files and log files are available online (see 
footnote 3). The experiments are performed on a cluster of Dell PowerEdge M610 
servers with two Xeon E5520 processors and 24 GB internal memory each. The 
tools are compiled with gcc 5.4.0 on Ubuntu 16.04. The experiments for up to 48 
cores are performed on a single computer with 4 AMD Opteron 6168 processors 
with 12 cores each and 128 GB internal memory. 

When reporting on parallel executions, we use the number of workers for how 
many hardware threads (cores) were used. 


Overview. After running all experiments, we obtain the results for 413 models 
in total, of which 196 models with the Force variable ordering and 217 models 
with the Sloan variable ordering. In the remainder of this section, we study these 
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Table 3. Cumulative time and parallel speedups for each method-#workers combina- 
tion on the models where all methods solved the model in time. These are 301 models 
in total: 151 models with Force, 150 models with Sloan. 


Method Order | Total time (sec) with # workers | Total speedup 

1 2 4 8 16 2 4 8 16 
otf-ldd-sat Sloan |1850 1546 698 398 313 1.2 2.7 4.6 5.9 
ldd-sat Sloan | 932 609 311 194 151 1.5 3.0 4.8 6.2 
Idd-chaining Sloan | 4156 3019 1916 1121 863 1.4 2.2 3.7 4.8 
ldd-bfs Sloan | 9030 5585 2990 1652 1219 16 3.0 5.5 7.4 
bdd-sat Sloan | 708 419 212 139 115 1.7 3.3 5.1 6.1 


mdd-sat Sloan | 572 ee Se 
otf-ldd-sat Force | 2704 1162 712 401 343 2.3 3.8 6.8 7.9 


Idd-sat Force | 856 602 348 216 180 1.4 2.5 4.0 4.7 
Idd-chaining Force | 3149 2560 1835 1160 1024 1:2 1:7 -2-7 3.1 
ldd-bfs Force | 4696 2951 1556 859 633 1.6 3.0 5.5 7.4 
bdd-sat Force | 1041 733 384 253 206 1.4 2.7 4.1 5.1 


mdd-sat Force | 1738 Sa = 2 


413 benchmarks. See Table 2, which shows the number of models for which each 
method could compute the set of reachable states within 20 min. 

To correctly compare all runtimes, we restrict the set of models to those where 
all methods finish within 20 min with any number of workers. We retain in total 
301 models where no solver hit the timeout. See Table 3 for the cumulative times 
for each method and number of workers and the parallel speedup. Notice that 
this is the speedup for the entire set of 301 models and not for individual models. 


Comparing LDD saturation with Meddly’s saturation. We evaluate how ldd-sat 
with just 1 worker compares to the sequential saturation of Meddly. The goal 
is not to directly measure whether there is a parallel overhead from using par- 
allelism in Sylvan, as the algorithm in lddmc is fundamentally different because 
it uses LDDs instead of MDDs and the algorithm does not in-place saturate 
nodes, as also explained in Sect.3. The low parallel overheads of Sylvan are 
already demonstrated elsewhere [15,16,18]. Rather, the goal is to see how our 
version of saturation compares to the state-of-the-art. 

Table 2 shows that Meddly’s implementation (mdd-sat) and our implementa- 
tion (Idd-sat 1) are quite similar in the number of solved models. Meddly solves 
375 benchmarks and our implementation solves 388 within 20 min. 

See Table 3 for a comparison of runtimes. Meddly solves the 150 models with 
Sloan almost 2x as fast as our implementation in Sylvan, but is slower than our 
implementation for the 151 models with Force. We observe for individual models 
that the difference between the two solvers is within an order of magnitude for 
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Table 4. Parallel speedup for a selection of benchmarks on the 48-core machine (only 
top 5 shown) 


Model (with ldd-sat) Order Time (sec) Speedup 
1 24 48 24 48 

Dekker-PT-015 Sloan 77.3 4.7 2.4 16.3 32.5 
PhilosophersDyn-PT-10 Force 273.8 16.8 12.4 16.3 22.1 
Angiogenesis-PT-10 Sloan 333.2 28.5 16.5 11.7 20.2 
SwimmingPool-PT-02 Force 25.0 2.1 1.4 11.6 17.8 
BridgeAndVehicles-PT-V20P10N20 Force 1035.8 101.8 60.7 10.2 17.1 
Model (with otf-ldd-sat) 

Dekker-PT-015 Sloan 174.5 74 3.3 23.6 52.2 
SwimmingPool-PT-07 Sloan 1008.0 69.2 42.0 14.6 24.0 
SmallOperatingSystem-PT-MT0256DC0064 Sloan 957.3 52.9 40.0 18.1 23.9 
Kanban-PT-0050 Sloan 940.6 78.7 48.9 11.9 19.2 
TCPcondis-PT-10 Force 68.4 5.7 3.8 11.9 17.8 


most models, although there are some exceptions. Our implementation quickly 
overtakes Meddly with additional workers. 


Parallel Scalability. As shown in Table3, using 16 workers, we obtain a modest 
parallel speedup for saturation of 6.2x (with Sloan) and 4.7x (with Force). On 
individual models, the differences are large. The average speedup of the individ- 
ual benchmarks is only 1.8x with 16 workers, but there are many slowdowns 
for models that take less than a second with 1 worker. We take an arbitrary 
selection of models with a high parallel speedup and run these on a dedicated 
48-core machine. Table 4 shows that even up to 48 cores, parallel speedup keeps 
improving. We even see a speedup of 52.2. For this superlinear speedup we have 
two possible explanations. One is that there is some nondeterminism inherent in 
any parallel computation; another is already noted in [20] and is related to the 
“chaining” in saturation, see further [20]. 


Comparing LDD saturation with BDD saturation. As Table 3 shows, the ldd-sat 
and bdd-sat method have a similar performance and similar parallel speedups. 


On-the-fly LDD saturation. Comparing the performance of offline saturation 
with on-the-fly saturation, we observe the same scalability with the Sloan vari- 
able order, but on-the-fly saturation requires roughly 2x as much time. With 
the Force variable order, on-the-fly saturation is slower but has a higher parallel 
speedup of 7.9x. 


Comparing saturation, chaining and BFS. We also compare the saturation algo- 
rithm with other popular strategies to compute the set of reachable states, 
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global: N transition relations Ro... Rm-—1 


1 def bfs(S): 1 def chaining(S): 

2 U<S 2 U<cS 

3 while U 40: 3 while U 40: 

4 U <— par-next(U, 0, M) 4 for i € [0, M) : 

5 U —minus(U, S) 5 U < relprodunion(U, R;) 
6 S — union (U, S) 6 U — minus (U, S) 
7 return S$ 7 S =< union(U, S) 
8 def par-next(S, i, k): R rohan 

9 if k = 1: return relprod(S, Ri) 
10 do in parallel: 
11 left — par-next(S, i, k/2) 
12 right — par-next (S, i + k/2, k — k/2) 
13 return union(left, right) 


Fig. 5. Algorithms bfs and chaining implement the Parallel BFS and Chaining strate- 
gies for reachability. 


namely standard (parallelized) BFS and chaining, given in Fig. 5. As Tables 2 
and 3 show, chaining is significantly faster than BFS and saturation is again 
significantly faster than chaining. In terms of parallel scalability, we see that 
parallelized BFS scales better than the others, because it can already parallelize 
in the main loop by computing successors for all relations in parallel, which 
chaining and saturation cannot do. For the entire set of benchmarks, saturation 
is the superior method, however there are individual differences and for some 
models, saturation is not the fastest method. 


6 Conclusion 


We presented a multi-core implementation of saturation for the efficient com- 
putation of the set of reachable states. Based on Sylvan’s multi-core decision 
diagram framework, the design of the saturation algorithm is mostly orthogo- 
nal to the type of decision diagram. We showed the implementation for BDDs 
and LDDs; the translation relation can be learned on-the-fly. The functionality 
is accessible through the LTSmin high-performance model checker. This makes 
parallel saturation available for a whole collection of asynchronous specification 
languages. We demonstrated multi-core saturation for Promela and for Petri 
nets in PNML representation. 

We carried out extensive experiments on a benchmark of Petri nets from the 
Model Checking Contest. The total speedup of on-the-fly saturation is 5.9x on 16 
cores with the Sloan variable ordering and 7.9x with the Force variable ordering. 
However, there are many small models (computed in less than a second) in this 
benchmark. For some larger models we showed an impressive 52x speedup on a 
48-core machine. From our measurements, we further conclude that the efficiency 
and parallel speedup for the BDD variant is just as good as the speedup for 
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LDDs. We compared efficiency and speedup of saturation versus other popular 
exploration strategies, BFS and chaining. As expected, saturation is significantly 
faster than chaining, which is faster than BFS; this trend is maintained in the 
parallel setting. Our measurements show that the variable ordering (Sloan versus 
Force), and the model representation (pre-computed transition relations versus 
learned on-the-fly) do have an impact on efficiency and speedup. Parallel speedup 
should not come at the price of reduced efficiency. To this end, we compared our 
parallel saturation algorithm for one worker to saturation in Meddly. Meddly 
solves fewer models within the timeout, but is slightly faster in other cases, but 
parallel saturation quickly overtakes Meddly with multiple workers. 

Future work could include the study of parallel saturation on exciting new 
BDD types, like tagged BDDs and chained BDDs [8,19]. The results on tagged 
BDDs showed a significant speedup compared to ordinary BDDs on experiments 
in LTSmin with the BEEM benchmark database. Another direction would be to 
investigate the efficiency and speedup of parallel saturation in other applications, 
like CTL model checking, SCC decomposition, and bisimulation reduction. 
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Abstract. An appealing feature of Signal Temporal Logic (STL) is the 
existence of efficient monitoring algorithms both for Boolean and real- 
valued robustness semantics, which are based on computing an aggregate 
function (conjunction, disjunction, min, or max) over a sliding window. 
On the other hand, there are properties that can be monitored with the 
same algorithms, but that cannot be directly expressed in STL due to 
syntactic restrictions. In this paper, we define a new specification lan- 
guage that extends STL with the ability to produce and manipulate 
real-valued output signals and with a new form of until operator. The 
new language still admits efficient offline monitoring, but also allows to 
express some properties that in the past motivated researchers to extend 
STL with existential quantification, freeze quantification, and other fea- 
tures that increase the complexity of monitoring. 


1 Introduction 


Signal Temporal Logic (STL [16,17]) is a temporal logic designed to specify 
properties of real-valued dense-time signals. It gained popularity due to the 
rigour and the ability to reason about analog and mixed signals; and it found 
use in such domains as analog circuits, systems biology, cyber-physical control 
systems (see [3] for a survey). A major use of STL is in monitoring: given a signal 
and an STL formula, an automated procedure can decide whether the formula 
holds at a given time point. 

Monitoring of STL is reliably efficient. A monitoring procedure typically 
traverses the formula bottom up, and for every sub-formula computes a satisfac- 
tion signal, based on satisfaction signals of its operands. Boolean monitoring is 
based on the computation of conjunctions and disjunctions over a sliding window 
(“until” is implemented using a specialized version of running conjunction), and 
robustness monitoring (computing how well a signal satisfies a formula [9,10]) is 
based on the computation of minimum and maximum over a sliding window. The 
complexity of both Boolean and robustness monitoring is linear in the length of 
the signal and does not depend on the width of temporal windows appearing in 
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the formula. At the same time, for a range of applications, pure STL is either 
not expressive enough or difficult to use, and specifying a desired property often 
becomes a puzzle of its own. The existence of robustness and other real-valued 
semantics does not always help, since a monitor can perform a limited set of 
operations that the semantics assigns to Boolean operators. For example, for 
robustness semantics, min and max are the only operations beyond the atomic 
proposition level. 

One way to work around the expressiveness issues of STL is pre-processing: 
a computation that cannot be performed by an STL monitor can be performed 
by a pre-processor and supplied as an extra input signal. For a number of rea- 
sons, this is not always satisfactory. First, for monitoring of continuous-time 
signals, there is a big gap between the logical definitions of properties and the 
implementation of monitors. In continuous-time setting, properties are defined 
using quantification, upper and lower bounds, and similar mathematical tools 
for dense sets, while a monitor works with a finite piecewise representation of a 
signal and performs a computation that is based on induction and other tools 
for discrete sets. Leaving this gap exposed to the user, who has to implement 
the pre-processing step, is not very user-friendly. Second, monitoring of some 
properties cannot be cleanly decomposed into a pre-processing step followed by 
standard STL monitoring. Later, we give a concrete example using an extended 
“until” operator, and for now, notice that “until” instructs the monitor to com- 
pute a conjunction over the window that is not fixed in advance, but is defined 
by its second operand. Because of that, multiple researches have been motivated 
to search for a more expressive superset of STL that would allow to specify the 
properties they were interested in. 

One direction for extension is to add to the original quantifier-free logic 
(MTL, STL) a form of variable binding: a freeze quantifier as in STL* [6], a 
clock reset as in TPTL [1], or even first order quantification [2]. Unfortunately, 
such extensions are detrimental to complexity of monitoring. When monitoring 
logics with quantifiers using standard bottom-up approach, subformulas con- 
taining free variables evaluate not to Boolean- or real-valued signals, but to 
maps from time to non-convex sets, and they cannot in general be efficiently 
manipulated (although for some classes of formulas monitoring of logics with 
quantifiers works well [4,13]). Perhaps the most benign in this respect but also 
least expressive extension is 1-TPTL (TPTL with one active clock), which is 
as expressive as MITL, but is easier to use and admits a reasonably efficient 
monitoring procedure [11]. 

An alternative direction is to define a quantifier-free specification language 
with more flexible syntax and sliding window operations. For example, Signal 
Convolution Logic (SCL [20]) allows to specify properties using convolution with 
a set of select kernels. In particular, it can express properties of the form “state- 
ment y holds on an interval for at least X% of the time”. In SCL, every formula 
has a Boolean satisfaction signal, but some works go further and allow a for- 
mula to produce a real-valued output signal based on the real-valued signals of 
its subformulas. This already happens for robustness of STL in a very limited 
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way, and can be extended. For example, [19] presents temporal logic monitor- 
ing as filtering, which allows to derive multiple different real-valued semantics. 
Another work [7] focuses on the practical application of robustness in falsifica- 
tion and allows to choose between different possible robust semantics for “even- 
tually” and “always”, in particular to replace min or max with integration where 
necessary. 

This paper is our take on extending STL in the latter direction. We define a 
specification language that is more expressive than STL, but not less efficient to 
monitor offline, i.e., the complexity of monitoring is linear in the length of the 
signal and does not depend on the width of temporal windows in the formula 
(the latter property tends to be missing from the STL extensions, even when the 
authors can achieve linear complexity for a fixed formula). The most important 
features of the new language are as follows. 


1. We remove several syntactic constrains from STL: we allow a formula to have 
a real-valued output signal; we allow these signals to be combined in a point- 
wise way with arithmetic operations, comparisons, etc. This distinguishes us 
from the works that use standard MTL or STL syntax and assign them new 
semantics [10, 19]. 

2. We allow to apply an efficiently computable aggregate function over a sliding 
window. We currently focus on min and max, which are enough to specify 
properties that motivated the development of more expressive and hard to 
monitor logics. 

3. We offer a version of “until” operator that performs aggregation over a sliding 
window of dynamic width, that depends on satisfaction of some formula. 
This distinguishes us from the works that focus on aggregation over a fixed 
window [20]. 


Finally, we focus our attention on continuous-time piecewise-constant and piece- 
wise linear signals; we describe the algorithms and prepare an implementation 
only for piecewise-constant. 


2 Motivating Examples 


Before formally defining the new language, let us look at some examples of 
properties that we would like to express. In particular, we look at properties that 
motivated the development of more expressive and harder to monitor logics. 


Example 1 (Stabilization). The first interesting property is stabilization 
around a value that is not known in advance, e.g., “x stays within 0.05 units of 
some value for at least 200 time units”. It is tempting, to formalize this prop- 
erty using existential quantification “there exists a threshold v, such that...”, 
which is possible with first-order logic of signals (and was one of its motiva- 
tional properties [2]), but it is actually not necessary. Instead, we can compute 
the minimum and maximum of x over the next 200 time units and compare 
their distance to 0.1 = 2- 0.05. In some imaginary language, we could write 
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maxo0,200] X — Minjo,200) X < 0.1. At this point we propose to separate the aggre- 
gate operators from the operator that defines the temporal window, which will 
be useful later, when the “until” operator will define a window of variable width. 
We use the operator Onja,b] to define the temporal window of constant width 
and the operators Min and Max (capitalized) to denote the minimum and max- 
imum over the previously defined window. Signal x stabilizes within 0.05 units 
of an unknown value for 200 time units: 


Onjo,200] Maxx- Ono, 200] Minx < 0.1 


Figure 1 shows an example of a signal x(t) (red) performing damped oscillation 
with the period of 250 time units. Blue and green curves are the maximum 
and the minimum of x over a siding window [t,t + 200]. Finally, the orange 
Boolean signal (its y scale is on the right) evaluates to true (i.e., y = 1) when 
the maximum and minimum of x over the next 200 time units are within 0.1. 


Example 2 (Local Maximum). Consider the property: “the current value 
of x is a minimum or maximum in some neighbourhood of current time point”. 
Previously, a similar property became a motivation to extend STL with freeze 
quantifiers [6], but we can also express it by comparing the value of a signal with 
some aggregate information about its neighbourhood, which we can do similarly 
to the previous example. 

Current value of x is a local maximum on the interval |0, 85] relative to the 
current time. 

x2 Onjo,85] Max x 


Figure 2 shows an example of a sine wave x(t) (red) with the period of 250 time 
units. Blue curve is the maximum ~x over a siding window [t,t + 85]. The orange 
Boolean signal evaluates to true when the current value of x is a maximum for 
the next 85 time units. 


gan 
On[0,200] Max x 4 
On[0,200] Min x Ja a 

Stable[0,200] = 


x 
On[0,85] Max x 4 
i Localmax[0,85] E 
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Fig. 1. Damped oscillation x(t) and its max- 
imum and minimum over the window [t,t + 
200]. (Color figure online) 


Fig. 2. Sine wave x(t), its maxi- 
mum over the window [t,t + 200], 
and whether x(t) is a local max- 
imum on the interval [f,t + 200]. 
(Color figure online) 
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Example 3 (Stabilization Contd.). We want to be able to assert that x 
becomes stable around some value not for a fixed duration, but until some signal 
q becomes true. We will be able to do this with our version of “until” operator. 
Signal x is stable within 0.05 units of an unknown value until q becomes true: 


(Max x U q)- (Minx U q) < 0.1 


Intuitively, for a given time point, we want the monitor to find the closest future 
time point, where q holds and compute Min and Max of x over the resulting 
interval. Note that this property cannot be easily monitored in the framework 
of “STL with pre-processing”, since it requires the monitor to compute Min and 
Max over a sliding window of variable width, which depends on the satisfaction 
signal of q. 


Example 4 (Linear Increase). At this point, we can assert x to follow a 
more complex shape, for example, to increase or decrease with a given slope. 
Let T denote an auxiliary signal that linearly increases with rate 1 (like a clock 
of a timed automaton), i.e. we define T(t) = t; this example works as well for 
T(t) =t+c, where c is a constant. To specify that x increases with the rate 2.5, 
we assert that the distance from x to 2.5-T stays within some bounds. 

Signal x increases approximately with slope 2.5 during the next 100 time units: 


Onjo,100] Max |x = 2.5T| = Onjo, 100] Min |x = 2.5T| < 0.1 


3 Syntax and Semantics 


From the examples above we can foresee how the new language looks like. For- 
mally, an (input) signal is a function w : T > R”, where the time domain T is a 
closed real interval [0, |w|] € R, and the number |w| is the duration of the signal. 
We refer to signal components using their own letters: x, y,--- € T — R. We 
assume that every signal component is piecewise-constant or piecewise-linear. 

The semantics of a formula is a piecewise-constant or piecewise-linear func- 
tion from real time (thus, has real-valued switching points) to a dual number 
(rather than a real). We defer the discussion of dual numbers until Sect. 3.2; for 
now we note that they extend reals, and a dual number can be written in the 
form a + be, which, when b + 0, denotes a point infinitely close to a. We denote 
the set of dual numbers as Rs. Our primary use of a dual number is to represent 
a time point strictly after an event (switching point, threshold crossing, etc.) 
but before any other event can happen; as a result we have to allow an output 
signal to have a dual value, denoting a value that is attained at this dual time 
point. 


Syntax. We can write the abstract syntax of our language as follows: 


g == c | x | fgi-- gn) | Onjao Y |b UF, o ol gid Uf. pe 


) (1) 
y := Ming | Maxey 
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where c is a real-valued constant; x refers to an input signal; f is a real-valued 
function symbol (e.g., sum, absolute value, etc.); for the On-operator, a and b 
can be real numbers or (with some abuse of notation) +00, i.e., the interval may 
refer to both past and future, bounded or unbounded; for the U-operator, d is a 
real value, and a,b are non-negative, and b can be œ, i.e., the interval refers to 
bounded or unbounded future. Let us go over some of the features of the new 
language and then formally write down its semantics. 


Point-wise Functions. Function symbol f ranges over real-valued functions 
R” — R that preserve the chosen shape of signals (and can be lifted to dual num- 
bers). In this paper, we focus on piecewise-constant and piecewise-linear signals, 
so when f is applied point-wise to a piecewise-constant input, we want the result 
to be piecewise-constant; when f is applied point-wise to a piecewise-linear input, 
we want the result to be piecewise-linear. Examples of such functions are addi- 
tion, subtraction, min and max of finitely many operands (we use lowercase min 
and max to denote a real-valued n-ary function), multiplication by a constant, 
absolute value, etc. 

Boolean Output Signals. Output signals of some formulas can informally 
be interpreted as Boolean-valued. In Example 2, “x” and “Onjo,g5; Max x” are 
dual-valued, but the result of their comparison, “x > Onjo,g5) Max x” should be 
interpreted as Boolean. Here, we take the more simple path and treat a Boolean 
signal as a special case of a real-valued signal that can take the value of 0 or 1. 
We expect comparison operators to produce a value in {0,1}, e.g., yi < ga isa 
shortcut for “if p1 < gg then 1 else 0”. Standard Boolean connectives can then 
be defined as follows: 


Y1 A p2 = min{ $1, p2} $1 V P2 = max{$1, Y2} ~y =1- 9 


Another option would be to distinguish Boolean-valued formulas on the syntactic 
level. 


Temporal y-Formulas. Symbol gy denotes a temporal formula that has a dual- 
valued output signal. In other words, it can be evaluated at a time point and 
produces a dual value. A -formula may: 


1. refer to an input signal x; 

2. apply a real-valued function f pointwise to the outputs its y-subformulas; 

3. apply an aggregate function over the sliding window [a, b] (with some abuse 
of notation a can be —œ, and b can be ov); 

4. be an “until” formula, which is described in Sect. 3.3. 


Interval y-Formulas. A y-formula is evaluated on an interval and does not 
have an output signal by itself. Instead, it supplies an aggregate operation that 
will be computed when evaluating the containing On-formula or “until” -formula. 
It should be possible to efficiently compute this aggregate operation over a sliding 
window, and it should preserve the chosen shape of signals. Since we focus on 
piecewise-constant and piecewise-linear signals, the two operations that we can 
immediately offer are Min and Max, which can be efficiently computed over a 
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sliding window using the algorithm of Lemire [9,15], and preserve the piecewise- 
constant and piecewise-linear shapes. In discrete time or for piecewise-polynomial 
signals, we could use more aggregate operations, e.g., integration. 


“Eventually” and “Always”. Standard STL “eventually” and “always” oper- 
ators can be expressed in the new language as follows: 


Fia,b] g = Onja, b] Max ọ Gia,b] g= Onja, b] Min p 


3.1 Semantics of Until-Free Fragment 


The semantics of the until-free fragment is straightforward. The semantics of a 
-formula is a function [y] : T — Rə mapping real time to a dual value. We 
define it as: 


KIO = x6) [Onan YIO = ly] + a, t + b)) 
[f(gi--- PAO = FO -fpl 


We abuse the notation so that x is both a symbol referring to a component of 
an input signal and the corresponding real-valued function; similarly, f is both 
a function symbol and the corresponding function. 

The semantics of a y-formula is a function |y] : (R U -0) x (Rs U œ) > Re 
from an interval of time with real lower bound to a dual value. The upper bound 
of the interval can be dual-valued, which will be used by the “until” operation 
(see Sect. 3.3). 


[Min y][a, b] = min[y] [Max y]|[a, b] = merie] (3) 


(2) 


The way we define min and max over an interval for a discontinuous piecewise- 
linear function relies on dual numbers, which we explain just below. 


3.2 Dual Numbers 


Dual numbers extend reals with a new element ¢ that has a property £? = 0. 
A dual number can be written in a form a + be, where a,b € R. We denote 
the set of dual numbers as R,. Dual numbers were proposed by the English 
mathematician W. Clifford in 1873 and later applied in geometry by the German 
mathematician E. Study. One of modern applications of dual numbers and their 
extensions is in automatic differentiation [12]: one can exactly compute the value 
of the first derivative at a given point using the identity f(x +€) = f(x) + f’(we. 
Intuitively, £ can be understood as an infinitesimal value, and a + be (for b + 0) 
is a point that is infinitely close to a. Polynomial functions can be extended 
to dual numbers, and via Taylor expansion, so can exponents, logarithms, and 
trigonometric functions. We work with piecewise-constant and piecewise-linear 
functions with real switching points, and we only make use of basic arithmetic. 
For example, if on the interval (b1, b2) the signal x is defined as x(t) = aıt + do, 
then x(by + £) = a,b, + ao + aye and x(bg — £) = a1b2 + ao — aye. 
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Fig. 3. Signals x and y for Example 8. Fig. 4. Signals x and y for Examples 5 
and 6. 


Our primary use of a dual number is to represent a time point strictly after 
an event (a switching point, a threshold crossing, etc.) but before any other 
event can happen, i.e., we use t + € to represent the time point that happens 
right after t’. The coefficient 1 at e denotes that time advances with the rate of 
1 (although another consistently used coefficient works as well). Consequently, 
we also allow an output signal to produce a dual value, denoting a value that is 
attained at this dual time point. On the other hand, we require that signals are 
defined over real time, switching points of piecewise signals are reals, and time 
constants in formulas are reals. That is, dual-valued time is only used internally 
by the temporal operators and cannot be directly observed. 


Minimum and Maximum of a Discontinuous Function. We also use dual- 
valued time to define the result of Min and Max for a discontinuous piecewise- 
linear function. The standard way to compute minimum and maximum of a 
continuous piecewise-linear function on a closed interval is based on the fact 
that they are attained at the endpoints of the interval or at the endpoints of 
the segments on which the function is defined. Using dual numbers, we extend 
it to discontinuous functions: if for t € (b1, be), x(t) = aıt + ag then we consider 
time points bı + € and bz — € as the candidates for reaching the minimum or 
maximum. Let us demonstrate this with an example. 


Example 5. Consider the signal x defined as: “x(t) = -0.5t + 1.5ift € 
[0,1); x(t) = 0.5t + 1ift > 1”, as shown in Fig.4. Let us find the minimum 
of x on the interval [0,2 +8]. By our definition, min;¢jo,2+<) x(t) = min{x(0), x(1 - 
£), x(1), x(2+e)} = x1 -e) = 14+0.5e. This result should be understood as follows: 
x(t) approaches the value of 1 from the above with derivative —0.5, but never 


reaches it. 


Example 6. Our definition of minimum and maximum allows to correctly com- 
pare values of piecewise-linear functions around their discontinuity points. In 
Example 5, x never reaches the value of its lower bound, and our definition of 
minimum produces a dual number that reflects this fact and also specifies the 
rate at which x approaches its lower bound. This information would be lost 
if we computed the infimum of x. Again consider the signals in Fig. 4, with x 
defined as before, and “y(t) = t, if t € [0,1), y(t) = -0.5t +1, ift > 1”. Let us 
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evaluate at time t = 0 the formula Onjo,2] Min x > Onjo,2] Max y, which denotes 
the property Vz,t’ € [0,2]. x(t) > y(t’). From the previous example, we have 
that [Onjo,2; Min x](0) = 1 + 0.5e. By a similar argument, [Onjo,2; Max y](0) = 
y(1 — £) = 1 -«, which means that y approaches 1 from below with the rate of 
1. Since, 1+ 0.5e > 1 — £, our property holds at time 0, as expected. 


We want to emphasize that while an output signal can take a dual value, its 
domain is considered to be a subset of reals. The semantics of temporal operators 
are allowed to internally use dual-valued time points, but has to produce an 
output signal that is defined over real time. This ensures that a piecewise signal 
always has real-valued switching points and that no event can happen at a dual- 
valued time point. 


Example 7. Consider a formula y = Fio,9)(« = Onc ing int) Min x), where x is as 
in Fig. 4. The meaning of ọ is that within 2 time units x reaches its global mini- 
mum. In our semantics, this formula does not hold at time 0. By our definition, 
the global minimum of x is 1+0.5¢, so the semantics of the formula at time 0 is 
equivalent to: 


AO) 


[Fio a(x =1+ 0.5£)](0) 
if St e T. t € [0,2] A x(t) = 1 + 0.5e then 1 else 0 


where T = [0,|w|] € R. There is no real value of time, where x(t) yields a dual 
value, so the formula does not hold. 


3.3 Semantics of Until 


The On-operator allowed us to compute minima and maxima over a sliding 
window of fixed width. In this section, we introduce a new version of “until” 
operator that allows the window to have variable width that depends on the 
output signal of some formula. 


Reinterpreting the Classical Until as “Find First”. Let us explain how 
we extend the “until” operator to work in the new setting. There already exists 
real-valued robust semantics of “until”, but we do not believe it to be a good 
specification primitive. Instead, re-state standard the Boolean semantics and 
based on the re-stated version introduce the new real-(actually, dual-)valued 
semantics. Let us recall a possible semantics of untimed until in STL. Informally, 
“until” computes a conjunction of the values of the first operand over an interval 
that is not fixed, but defined by the second operand. Formally, 


[p UST” g](t) = St’ > t. g(t’) A Ys € [t,t’]. p(s) 


To denote the STL version of “until” we write it with the superscript: UST", to 
distinguish from the new version that we define for our language. The version 
of “until” that we use in this paper is non-strict in the sense of [17]; it requites 
that p holds both at t and fr’. 
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Efficient monitoring of STL “until” relies on instantiating the existential 
quantifier. The monitor scans the signal backwards and instantiates t’ based on 
the earliest time point where q is true. The monitor needs to consider three cases 
shown in Figs.5, 6 and 7. 


n g 
q q q 
0 — n == e Je 
t t i t i 
Fig.5. Case 1: q is Fig. 6. Case 2: q there Fig. 7. Case 3: q becomes 
never true in the future. exists the earliest time true, but there is no earli- 
point, where q becomes est time point. 
true. 


1. Figure 5: q is false for every t’ > t. Then the value of p UST" q at t is false. 

2. Figure 6: there exists the smallest t’ > t, where q is true (this includes the 
case, where t’ = t). Then the value of pUST at t is Vs € [t,t]. p(s) (predicate 
p is not shown in the figure). The monitor needs not consider time points after 
t’, since if “forall” produces false on a smaller interval, it will produce false 
on a larger one. 

3. Figure 7: q becomes true in the future, but there is no earliest time point. 
In this case, the monitor needs to take the universal quantification over an 
interval that ends just after t’ (the switching point of q), but before any 
other event occurs. We can formalize this reasoning using dual numbers and 
say that the value of p UST g at t is Vs € [t,t +€]. p(s), where t’ + € can be 
intuitively understood as a time point that happens after t’, but before any 
other event can occur. 


Below is the equivalent semantics of STL until that resolves the existential quan- 
tifier: 


Vs € [t,t’]. p(s), if there exists the smallest t’ > t, s.t. g(t’) 
[p UST" gn) = Vs € [tt +e]. p(s), where t = inf{t’|t’ >t A q(t’)}, 
if Fr’ > t. g(t’), but there is no smallest t’ 


false, otherwise 


Then, a monitor evaluates the universal quantifier via a finite conjunction, since 
in practice the signal p has finite variability, i.e. every interval is intersected by 
a finite number of constant segments. 


Example 8. Let us consider two linear input signals: x(t) = t and y(t) = 2t — 1 
(see Fig. 3), and let us evaluate the formula (y < x) UST (x > 1) at time 0 
using non-strict “until” semantics. We define the earliest time point where x > 1 
becomes true to be 1 + s, thus we need to evaluate the expression Vr € [0,1 + 
e]. y(t) < x(t). At time 1 +€, we get y(1 +e) =1+2e>1+e=x(1 + €), thus the 
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“until” formula does not hold. Informally, we can interpret the result as follows: 
when x becomes greater than 1, y becomes greater than x, while non-strict 
“until” requires that there exists a point, where both its left- and right-hand 
operands hold at the same time. 


New Until as “Find First”. At this point, extending “until” to produce a 
dual value is straightforward. With every time point, “until” possibly associates 
an interval, and we can compute an arbitrary aggregate function over it, instead 
of just conjunction. In fact, we introduce two flavors of “until”. The first version: 
y Ufs y — works as follows. For every time point t, we either associate an 
interval ending when y becomes non-zero (i.e., starts holding); or we report that 
no suitable end point was found. When such interval exists, we evaluate y on it. 
When the interval does not exist, we produce d. Formally, 


(w][z7’], if a the smallest t’ € [t +a,t +b], s.t. [y](t’) #0 
lyi t +e], where t = inf{t’|r’ € [t +a, t + b] A [yg] (t’)}. 

if ar’ € [t+a,t+d]. [y](t’) #0, but there is no smallest £’ 
d, otherwise 


[Uf 1910 = 


The second version: g1 | Us, pl ¥2 does not perform aggregation, but evaluates 91 
at the time point where y2 becomes non-zero, or produces d if such time point 
does not exist: 


[¢i](@’), if 3 the smallest t’ € [t +a,t +b], s.t. [ye](t’) #0 
lgl UŽ I= [vi](@’ +), where t = inf{r’|¢’ € [t +a, t+ b] A [polt 
[a,b] if ar’ € [t +a, t +b]. [p2](t) +0, but there is no smallest t’ 


d, otherwise 


In a similar way, we could define past versions “until”, where the interval [a, b] 
refers to the past; we do not discuss them here due to space constraints. 
STL Until. The standard STL “until” can be expressed in the new language 
as follows: 

pı Ukr pi $2 = (Min g1) Ufa, p) 2 


Lookup. Using “until”, we can express the “lookup” operator that queries the 
value of a signal at a point in the future, or returns some default value if the 
point does not exist. 
d d 
Da = gl Ufaa]! 


Example 9 (Spike). The ST-Lib library [14] uses the following formula to 
define a start point of a spike: x’ > m A Fjo aj(x’ < —m), where x’ is the approx- 
imation of the right derivative x’(t) = (x(t + 6) — x(t))/6, m is the magnitude of 
the spike, and d is the width. Using the lookup operator, we can include the 
definition of x’ in the property itself: 


(Ds —x)/6>mA Fio,a\((Dy x — x)/6 < —m) 


where y gives the value of the signal outside of its original domain. 
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Fig. 8. Before time 2, an event p is Fig.9. A sequence of spikes and a 
followed by an event q. Boolean signal marking the detected 
start times of spikes. (Color figure 
online) 


Example 10 (Spike of Given Width and Height). Our language offers 
several alternative ways to define a spike. We can define a (start point of a) 
spike by composing two ramps: an increasing one, where the signal x increases 
by at least m withing w time units, and a decreasing one, where x decreases by 
at least m within w time units; the two ramps should be at most w units apart. 
The parameter w is the half-width of the spike. 


(Onjo, w] Max x > x + m) A Fio,w](Onjo,w] Min x < x — m) 


Figure 9 shows an example of a series of spikes (blue) and a Boolean signal (red) 
that marks the detected start times of spikes. 


Example 11 (TPTL-like Assertion). The second form of “until” allows 
to reason explicitly about time points and durations, somewhat similarly to 
TPTL. Consider the property “within 2 time units, we should observe an event 
p followed by an event q” (Fig. 8 shows an example of a satisfying signal). With 
some case analysis, this property can be expressed in MTL [5], but probably the 
best way to express it is offered by TPTL, that allows to assert “c. F(pAF(q^c < 
2))”, meaning “reset a clock c, eventually, we should observe p and from that 
point, eventually we should observe q, while the clock value will be at most 2”. 
To express the property in our language, we introduce three auxiliary signals: 
T(t) = t (which we use in some other examples as well), pdelay = (T | U”p) -T, 
which denotes the duration until the next occurrence of p and similarly qdelay = 
(T | U”q)-T, the duration until the next occurrence of q. Then, the property 
can be expressed as: pdelay + (qdelay | U” p) < 2. 


4 Monitoring 


Similarly to other works on STL monitoring (e.g., [9]), we implement the algo- 
rithms for a subset of the language, and support the remaining operators via 
rewriting rules. 


Specification and Efficient Monitoring Beyond STL 91 


Rewriting of Until. Similarly to STL, the timed “until” operator in our 
language can be expressed in terms of “eventually” (which is expressed using 
On), “lookup”, and untimed “until”. 


(Min ¢1) Uf, p $2 = if Flap) p2 then d else Onjo,aj Min((Min g1) U g2) 
(Max gı) U, i p2 = if ~ F{a,b] ¢2 then d else Onjo,a} Max((Max 1) U g2) 
gil Uf, bl? = if a Fao) ¢2 then d else D, (gi) Ug2) 


Let us prove that the first equivalence is true, and for the other two the proof 
idea is similar. Let t be the time point where we evaluate (Min gi)Uh, py ¥2 and its 
rewriting. If there is no time point s € [t+a,t+b] where p2 holds, both the original 
formula and its rewriting evaluate to d. Otherwise, let s be the earliest time point 
in [t+a,t +b], where gy holds, which can be a real or dual value, as explained 
in Sect. 3.3. Then the original formula evaluates to min{[yi](t’) | £” € [f 5]}. 
The rewritten formula at t evaluates to min{[(Min g1) U ga] | £ € [ft + a]l}. 
Notice that for every t’ there is a time point in the future, which we denote 
g(t’) where g2 holds, which is at most s, and for t’ = t + it is exactly s. That 
is, the rewritten formula evaluates to min{min{[y,](?”) | t” € K g(t’)]} Ir € 
[tt +a]} = min{[gi fe”) | t” e Uff, g@)] | t € [424+ a]}}. Notice that since 
g(t’) € [t’, s] and g(t +a) = s, then U{[t’, g(t’)] | t’ € [4+ a]} = [ts], and thus 
the rewritten formula evaluates to the same value as the original one. 
Referring to Both Future and Past. In the syntax, we allow the Onja,p] 
operator to refer to both future and past, i.e., we allow the case when a < 0 
and b > 0. Algorithms for Min/Max over a running window typically cannot 
work with this situation directly, and we need to apply the following rewriting: 
if a < 0 and b> 0, 


Onya,6] Min yg = min{Onja, o] Min yg, Onjo,4} Min o} 
Onya,p] Max = max{Onja,o] Max y, Onjo,p] Max go} 


Language of the Monitor. The following subset of the language is equally 
expressive as the full language presented in (1). We implement the monitoring 
algorithms for this language, and the full syntax of (1) we support via rewriting. 


g == c |x | fr: Pn) | Onan Y IY U4 @| gil U%@ | Do 
y == Ming | Max 


where either a > 0 or b < 0, i.e., the interval [a,b] cannot refer to both future 
and past. 

All operators in the language of the monitor admit efficient offline monitoring. 
Minimum and maximum over a sliding window required by the On-operator can 
be computed using a variation of Lemire’s algorithm [9,15]; “lookup” operator 
D shifts its input signal by a constant distance; and for untimed “until” we can 
scan the input signal backwards and perform a special case of running minimum 
or maximum. 
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4.1 Monitoring Algorithms 


In this section, we briefly describe monitoring algorithms for piecewise-constant 
signals. 


Representation of Signals. We represent a piecewise-constant function T > 
R or T > Re as a sequence of segments: (So, 51,...,5m—1), Where every segment 
5; = J;  v; maps an interval J; to a real or dual value v;. The intervals J; form a 
partition the domain of the signal and are ordered in ascending time order, i.e., 
sup J; = inf Jj4, and J; N Ji+1 = Ø. The domain of the signal corresponding to the 
sequence u = (Jo H} Vo, . . -, Jm-1 Œ Vm-1) is denoted by dom(u) = JoU...U Jm-1.- 
For example, if the function x(t) is defined as x(t) = 0, if t € [0,1), and x(t) = 1, 
if t € [1,2], then x(t) is represented by the sequence ux = ([0,1) — 0, [1,2] 1), 
and dom(ux) = [0, 2]. 

Empty brackets () denote an empty sequence that does not represent a valid 
signal, but can be used by algorithms as an intermediate value. We manipulate 
the sequences with two main operations. The function append adds a segment to 
the end of a sequence: append((so, ...,Sm—1), 8’) = (So, - - -, Sm-1, 5’). The function 
prepend adds a segment to the start of a sequence: prepend((so,...,Sm-1)} S) = 
(s’,50,-+->Sm-1). This may produce a sequence where the first segment does 
not start time at time 0. While such a sequence does not represent a valid 
signal, it can be used by the algorithms as an intermediate value. The function 
removeLast removes the last segment of a sequence, assuming it was non-empty: 
removeLast((so, .. .,Sm-17) = (80+ ++» Sm-2)- 

An output signal of a formula is scalar-valued and is represented by one such 
sequence. An input signal usually has multiple components, i.e., it is a function 
T — R”, and is represented by a set of n sequences. 


On-Formulas. For Onja,b] Ming and Onja,p] Max y, a monitor needs to com- 
pute the minimum or maximum of the output signal of y over the sliding window. 
The corresponding algorithm was developed for discrete time by Lemire [15] and 
later adapted for continuous time [9]. 


Lookup-Formulas. Computing the output signal for D¢ y is straightforward. 
We need to shift every segment of uy (the representation of the output signal 
of y) to the left by a truncating at 0 and append a padding segment with the 
value of d. 


Until-Formulas. Informally, monitoring the “until”-formulas, Min ¢ U4 p2, 
Max yı Uf go, and gı | U%¢2, works as follows. The monitor scans the output 
signals of yı and 2 backwards. While p2 evaluates to a non-zero value, the 
monitor outputs the value of gı. When ¢2 evaluates to 0, the monitor outputs 
either the default value (if the monitor did not yet encounter a non-zero value 
of 2), or the running minimum or maximum of ¢}, or the value that yı had at 
the last time point where p was non-zero. 

The function until and untilAnd in Fig. 10 implement this idea. The inputs 
to the function until are: sequences uv; and uz representing the output signals 
of gy; and ¢2 (with dom(u1) = dom(uz)), default value d, and the function f 
used for aggregation; it can be min, max, or the special function Ax, y. x which 
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function until(uy, u2, f, d) function untilAdd(ur, s, v’, J, v1, v2) 
— yl 1 1 1 i 
let u= (Jo wR Yor int = "m-1) sae ales then 
let u = (Jf > vf,..., J hve.) 1 
0 0 k-1 k-1 sel 


icm-l1jck-l 
r,s, v’) = ((),0,d) 
while i>O0Aj 20 do 

end 


1 2 
ded Od prepend(uy, J > v’) 
(uy, s, v’) — untilAdd(uy, s, v’, J, vl, v) return (ur,s, v’) 
if J4 € J}: Vh € T t >t then end 
j=j+1 
else if Ane Fe. v4 € J}: t >ti 
then 
icit+l 
else 
ieitljcjrtl 
end 
end 
return u, 
end 


else if s#0 then 
v’ e fv’, v) 


Fig. 10. Algorithm for monitoring “until”-formulas. 


returns the value of its first argument and which we use to monitor the formula 
ıl U?%2. The function until scans the input sequences backwards and iterates 
over intervals where both input signals maintain a constant value (J). Each 
such interval is passed to the function untilAdd, which updates the state of the 
algorithm (v’, s) and constructs the output signal (u,). 


5 Implementation and Experiments 


We implemented the monitoring algorithm in a prototype tool that is available 
at https://gitlab.com/abakhirkin/StlEval. The tool has a number of limitations, 
notably it can only use piecewise-constant interpolation (so we cannot evaluate 
examples that use the auxiliary signal T(t) = t) and does not support past- 
time operators. It is written in C++ and uses double-precision floating point 
numbers for time points and signal values. We evaluate the tool using a number 
of synthetic signals and a number of properties based on the ones described 
earlier in the paper. 


Signals. We use the following signals discretized with time step 1. 


— Xsin — Sine wave with amplitude 1 and period 250; see red curve in Fig. 2. 

— Xdecay ~ damped oscillation with period 250. For t € [0, 1000), x defined as 
Xdecay(t) = 1 sin(250¢ + 250)e7 20%, see red curve in Fig.1; for t > 1000, the 
pattern repeats; 
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— Xspike — Series of spikes; a single spike is defined for t € [0,125) as: Xspike(t) = 
(t-50)? 
e 2107, and after that the pattern repeats; see blue curve in Fig. 9. 


Properties. We use the following properties: 


— Gstab = GF (Onjo,200) Max x — Onjo,200) Minx < 0.1), x always eventually 
becomes stable around some value for 200 time units. 

— Pstab-0 = G F Gyo,200)(|x| < 0.05): x always eventually becomes stable around 
0 for 200 time units. 

— Punti! = Glo,20x] F ((Max x) 7500, 0) (lx > 0.1))-((Min x) U0, (ll > 0.1)) < 
0.1, where x’ = (D? x- x), x always eventually becomes stable for at least 200 
time units and then starts changing with derivative of at least 0.1. 

— Ymax-min = G ((x > Onjo,s5] Maxx) => F(x < Onjo,s5] Min x)), every local 
maximum is followed by a local minimum. 

= Pabove-below = G (x > 0.85 > (Fx < —0.85)), if x is above 0.85, it should 
eventually become below —0.85. 

— spike = (Onjo,16] Maxx > x + 0.5) A Fio,16](Onjo, 16) Minx < x- 0.5), spike of 
half-width 16 and height at least 0.5. 

— Pspike-stlib = F (x’ > 0.04 A Fio, 25](x” < —0.04)), where x’ = (DÌ x — x), spike of 
width at most 25 and magnitude 0.04. 


Some properties are expressed in our language using On- and “until”-operators, 
and some are STL properties. This allows us to see how much time it takes to 
monitor a more complicated property in our language (e.g., Ystab, stabilization 
around an unknown value) compared to a similar but more simple STL property 
(€.2., Ystab-o, Stabilization around a known value). In our experiments we see a 
constant factor between 2 and 5. 

Table 1 shows the evaluation results. A row gives a formula and a signal 
shape; a column gives the number of samples in the input signal, and a table 
cell gives two time figures in seconds: the monitoring time excluding the time 
required to read the input data, and the total runtime of an executable. We note 
that for our tool, the total runtime is dominated by the time required to read 
the input signal from a text file. For the three STL properties we include the 
time it took AMT 2.0 (a monitoring tool written in Java [18]) and Breach (a 
Matlab toolbox partially written in C++ [8]; Breach does not have a standalone 
executable, so the we leave the corresponding columns empty) to evaluate the 
formula. This way we show that our implementation of STL monitoring has 
good enough performance to be used as a baseline when evaluating the cost of 
the added expressiveness in the new language. Time figures were obtained using 
a PC with a Core i3-2120 CPU and 8GB RAM running 64-bit Debian 8. 
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Table 1. Monitoring time for different formulas and signals. 


This paper AMT 2.0 Breach 
100k 1M 100k 1M 100k 1M 
stab Xdecay |0.004 0.05 | 0.048 0.39 
Pstab—0 Xdecay | 9-003 0.04] 0.023 0.38 }0.59 4.0/2.4 7.3) 0.053 -|0.42 - 
Puntil Xdecay | 0-01 0.05) 0.097 0.43 
Ymax—min Xsin |0.007 0.04] 0.07 0.4 
Yabove—below Xsin |0.002 0.04] 0.02 0.36] 0.6 3.1/2.4 7.5) 0.05 -| 0.4 - 
spike Xspike | 0.01 0.05) 0.1 0.45 
Pspike-stlib Xspike |9-006 0.05] 0.05 0.43] 1.0 4.0/5.0 130.058 -|0.47 - 


6 Conclusion and Future Work 


We describe a new specification language that extends STL with the ability to 
produce and manipulate real-valued output signals (while in STL, every formula 
has a Boolean output signal). Properties in the new language are specified in 
terms of minima and maxima over a sliding window, which can have fixed width, 
when using a generalization of F- and G-operators, or variable width, when using 
anew version “until”. We show how the new language can express properties that 
motivated the creation of more expressive and harder to monitor logics. Offline 
monitoring for the new language is almost as efficient as STL monitoring; the 
complexity is linear in the length of the input signal and does not depend on the 
constants appearing in the formula. 

There are multiple directions for future work; perhaps more interesting one 
is adding integration over a sliding window (in addition to minimum and max- 
imum). This is already allowed by some formalisms [7], and when added to 
our language will allow to assert that a signal approximates the behaviour of a 
system defined by a given differential equation (since we will be able to assert 
y(t) = i x(t)dt). Before making integration available, we wish to investigate how 
to better deal in a specification language with approximation errors. Finally, we 
wish to make our language usable in falsification, which means that for every 
formula with Boolean output signal we wish to be able to compute a real-valued 
robustness measure. 
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Abstract. Runtime Verification (RV) is the process of checking whether 
a run of a system holds a given property. In order to perform such a check 
online, the algorithm used to monitor the property must induce mini- 
mal overhead. This paper focuses on two areas that have received little 
attention from the RV community: Python programs and web services. 
Our first contribution is the VyPR runtime verification tool for single- 
threaded Python programs. The tool handles specifications in our, previ- 
ously introduced, Control-Flow Temporal Logic (CFTL), which supports 
the specification of state and time constraints over runs of functions. 
VyYPR minimally (in terms of reachability) instruments the input pro- 
gram with respect to a CFTL specification and then uses instrumentation 
information to optimise the monitoring algorithm. Our second contribu- 
tion is the lifting of VyPR to the web service setting, resulting in the 
VyYPR2 tool. We first describe the necessary modifications to the archi- 
tecture of VyPR, and then describe our experience applying VYPR2 toa 
service that is critical to the physics reconstruction pipeline on the CMS 
Experiment at CERN. 


1 Introduction 


Runtime Verification [1] is the process of checking whether a run of a system 
holds a given property (often written in a temporal logic). This can be checked 
while the system is running (online) or after it has run (post-mortem or offline). 
Often this is presented abstractly as checking an abstraction of behaviour, cap- 
tured by a trace. This abstract setting often ignores the practicalities of instru- 
mentation and deployment. This paper presents a tool for the runtime verifica- 
tion of Python-based web services that efficiently handles the instrumentation 
problem and integrates with the widely used web-framework Flask [2]. This 
work is carried out within the context of verifying web-services used at the CMS 
Experiment at CERN. 

Despite the wealth of existing logics [3-9], in our work [10,11] performing 
verification of state and time constraints over Python-based web services on the 
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CMS Experiment at CERN we have found that, in most cases, the existing logics 
operate at a high level of abstraction in relation to the program under scrutiny. 
This leads to (1) a less straightforward specification process for engineers, who 
have to think indirectly about their programs; and (2) difficulty writing spec- 
ifications about behaviour inside functions themselves. These observations led 
us to develop Control-Flow Temporal Logic [10,11] (CFTL), a logic that has a 
tight-coupling with the control flow of the program under scrutiny (so operates 
at a lower level of abstraction which, in our experience, makes writing specifi- 
cations with it easier for engineers) and is easy to use to specify state and time 
constraints over single runs of functions. 

After the introduction of CFTL (Sect. 2), the first contribution of this paper is 
a description of the VYPR tool (Sect. 3), which verifies single-threaded Python 
programs with respect to CFTL specifications. It does this by (1) providing 
PyCFTL, the Python binding for CFTL, for writing specifications; (2) instru- 
menting the input program minimally with respect to reachability; and (3) using 
the resulting instrumentation information to make its online monitoring algo- 
rithm more efficient. 

Since the development of VYPR as a prototype verification tool for CFTL, we 
have found that there are, to the best of our knowledge, no frameworks for fully- 
automated instrumentation and verification of multiple functions in web services 
with respect to low-level properties. Therefore, the second contribution of this 
paper is the lifting of CFTL and VyPR to the web service setting in a tool we call 
VyYPR2 (Sect. 4). We present a general infrastructure for the runtime verification 
of Python-based web services with respect to CFTL specifications. Moving from 
VYPR to VYPR2 presents a number of challenges, which we discuss in detail. 
For the moment, we focus on web services that use the Flask framework, a 
Python framework that allows one to write a web service by writing Python 
functions to serve as end-points. VyPR2 admits a simple specification process 
using PyCFTL, performs automatic and optimised instrumentation of the web 
service under scrutiny, and provides a separate verdict server for collection of 
verdicts obtained by monitoring CFTL specifications. 

Our final contribution is a case study (Sect.5) applying VyPR2 to the CMS 
Conditions Upload Service [12], a single-threaded Python-based web service used 
on the CMS Experiment at CERN. We find that our verification infrastructure 
induces minimal overhead on Conditions uploads, with experiments showing 
an overhead of approximately 4.7%. We also find unexpected violations of the 
specification, one of which has triggered investigations into a mechanism that was 
designed to be an optimisation but is in danger of adding unnecessary latency. 
Ultimately, VyYPR2 has made analysis of the performance of a critical part of 
CMS’ physics reconstruction pipeline much more straightforward. 


2 Control-Flow Temporal Logic (CFTL) 


Both of the tools presented in this paper make use of the CFTL specification 
language [10,11]. We briefly describe this language, focusing on the kinds of 
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@ =Vqaels:¢|VtelTr:¢|¢V¢|-7¢| true | pa 
ba = S(2) = v | S(x) = S(x) | Sle) € (n,m) | $(2) € [n,m] 
| duration(T) € (n,m) | duration(T’) € [n,m] 
I's := changes(z) | futures (q, changes(x)) | futures (t, changes()) 
Ir := calls(f) | futurer (q, calls(f)) | futurer (t, calls(f)) 
S := q | source(T) | dest(T) | nexts(S, changes(x)) | nexts (T, changes()) 
T := t| incident(S) | nextr(S, calls(f)) | nextr(T, calls(f)) 


Fig. 1. Syntax of CFTL. 


properties it can capture. CFTL is a linear-time temporal logic whose formulas 
reason over two central types of objects: states, instantaneous checkpoints in a 
program’s runtime; and transitions, the computation that must happen to move 
between states. 

Consider the following property, taken from the case study in Sect. 5: 


Whenever authenticated is changed, if it is set to True, then all 
future calls to execute should take no more than 1 second. 


This can be expressed in CFTL as 


Vq E€ changes(authenticated) : 
Vt € future(q, calls(execute)) : (1) 
q(authenticated) = True ==> duration(t) € [0, 1] 


This first quantifies over the states q in which the program variable 
authenticated is changed and then over the transitions t occurring after that 
state that correspond to a call of a program function called execute. Given this 
pair of q and t, the specification then states that if authenticated is mapped 
to True by q then the duration of the transition t is within the given range. 


Syntax. Figurel1 gives the syntax of CFTL. CFTL specifications take prenex 
form consisting of a list of quantifiers followed by a quantifier-free part. The 
quantification domains are defined by I's (for states) and I'r (for transitions). 
Terms produced by the S and T cases denote states and transitions respectively. 
We often drop the S and T subscripts from future and next when the meaning is 
clear from the context. The quantifier-free part of CFTL formulas is a boolean 
combination of atoms generated by ¢,4. Let A(y) be the set of atoms of a CFTL 
formula y and, for a € A(y), let var(a) be the variable on which a is based. 
In the above example A(y) = {q(authenticated) = True, duration(t) € [0,1]}, 
var(q(authenticated) = True) = q, and var(duration(t) € [0,1]) = t. A CFTL 
formula is well-formed if it does not contain any free variables (those not captured 
by a quantifier) and every nested quantifier depends on the previously quantified 
variable. 
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Forall(q = changes (’authenticated’)).\ 
Forall(t = calls(’execute’, after='’q’)).\ 
Check (lambda q, t : ( 
If(q(’ authenticated’) .equals (True) ) .then ( 
t.duration()._in([0, 1)) 
) 
)) 


Fig. 2. An example of a CFTL specification written in Python using PyCFTL. 


Semantics. The semantics of CFTL is defined over a dynamic run of the pro- 
gram. A dynamic run is a sequence of states tT = (0,t), where o is a map 
(partial functions with finite domain) from program variables/functions to val- 
ues and t € RŽ is a timestamp. Transitions are then pairs (7;,7;) for states 7; 
and rj. The product quantification domain over which a CFTL formula is evalu- 
ated is derived from the dynamic run using the quantifier list e.g. by extracting 
all states where some variable changes. Elements of the product quantification 
domain are maps from specification variables to concrete states/transitions and 
will be referred to as concrete bindings. 


3 VYPR 


We now present VyPR, which can perform runtime verification on a single 
Python function with respect to some CF TL specification y. Further details can 
be found in a paper [11] and technical report [10], and the tool is available online 
at http://cern.ch/vypr/. 


Tool Workflow. To runtime verify a Python function we follow the following 
steps. Firstly the property is captured as a CFTL specification using a Python 
binding called PyCFTL. Given this specification, VyPR instruments the input 
program so that the monitoring algorithm receives data from any points in the 
program that could contribute to a verdict. Finally, the modified program will 
communicate with the monitor at runtime, which will process the observations 
to produce a verdict. 


3.1 Writing CFTL Specifications with PyCFTL 


The first step is to write a CFTL specification. Note that such a specification is 
specific to a particular function being verified as it refers directly to the symbols 
in that function. For specification we provide PyCFTL, a Python binding for 
CFTL. Figure2 shows the PyCFTL specification for the CFTL specification in 
Eq. 1. A CFTL specification is defined in PyCFTL in two parts: 


1. The first part is the quantification sequence. For example, the quantification 
Yq € changes(a) is given as Forall(q = changes(’x’)). 
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2. The second part, the argument to Check(), gives the property to be eval- 
uated for each concrete binding in the quantification domain. This is done 
by specifying a template for the specification with a lambda expression (an 
anonymous function in Python) whose arguments match the variables in the 
quantification sequence. 


3.2 Instrumenting for CFTL 


VyYPR instruments a Python program for a CFTL specification y by building 
up the set Inst containing all points in the program that could contribute to the 
verdict of y. VYPR works at the level of the abstract syntax tree (AST) of the 
program and the program points of interest are nodes in the AST. Once this set 
of nodes has been computed, the AST is modified to add instruments at each of 
these points. 

During runtime monitoring the most expensive operation is usually the 
lookup of the relevant monitor state that needs to be modified. To make moni- 
toring more efficient, our instrumentation algorithm computes Inst by computing 
a direct lookup structure that allows the monitoring algorithm to go directly to 
this state. This structure can be abstractly viewed as a tree, Họ, whose leaves 
are sets that form a partition of Inst and whose intermediate nodes contain the 
information required to identify the relevant monitoring state. 

The first step in computing Hy is to construct the Symbolic Control-Flow 
Graph (SCFG) of the body of a (Python) function f. 


Definition 1. A symbolic control-flow graph (SCFG) is a directed graph 
(V, E,vs) where V is a finite set of symbolic states (maps from all program sym- 
bols, e.g. program variables/functions, to a status in { changed, unchanged, called, 
undefined}), E CV xV is a finite set of edges, and vs E€ V is the initial symbolic 
state. 


The SCFG of a function f is independent of any property y being checked. 
Our construction of the SCFG of a program encodes information about state 
changes (by symbolic states) and reachability (by edges being generated for 
each state-changing instruction in code), making it an ideal structure from 
which to derive candidate points for state changes. The SCFG is used to find all 
symbolic states or edges that could generate concrete bindings in the product 
quantification domain of a formula. For example, if the CFTL specification is 
Vq € changes(x) : g(a) < 10, all symbolic states representing changes to x will 
be identified as having potential to generate concrete bindings. From this, we 
construct a set of static bindings, which are maps from specification variables to 
candidate symbolic states/edges in the SCFG. The key distinction between con- 
crete and static bindings is that static bindings are computed from the SCFG 
before runtime, and can correspond to zero or more concrete bindings during 
runtime. We call the set of static bindings the binding space for y with respect 
to the SCFG and denote it by B, with the SCFG implicit. Elements 8 of By, 
form the top level of the tree Hy. 
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Data: y and the SCFG (V, E, vs) of function f 
Result: Lookup tree H, 
// Construct By 
Bp = {0}; 
foreach quantified variable (a; € predicate) in p in order do 
for v € V do 
if v is a candidate for predicate then 
Bo = {8 U [z; > v] | 8 E Bo Ai > 1 — reaches(G(x;_-1), v)}; 
end 
end 
// Construct Hy 
Hy = 9; 
for 6 € By with index ig do 
for quantified variable x; in p with index ig do 
foreach a € {a € A(y) | var(a) = xi} with index ia do 
| Helis, tq, ia) E lift(a, B(xi)); 
end 
end 
end 


Algorithm 1: VyYPR’s algorithm for construction of the tree Hy. 


Once B, is constructed, for each 6 € By, VYPR lifts each a € A(y) (the 
atoms of y) from the dynamic context to the SCFG in order to find the relevant 
symbolic states/edges around the symbolic state/edge {(var(a)). This process 
constructs the second and third levels of the tree Hg: the second level consisting 
of variables, and the third level of atoms in A(y). The leaves on the fourth level 
of the tree 7, are then the subsets of Inst; sets of symbolic states or edges from 
the SCFG. 

Whilst we can abstractly view 7, as a tree, in practice we represent it as 
a map from triples (ig, iy, ia) to symbolic states/edges of the SCFG where ig, 
iy and ia are indices into the binding space, quantifier list, and set of atoms 
respectively. An instrument placed in the input program for an atom a, using 
Hy, contains a triple to identify a subset of Inst and a value obs which is whatever 
code is required to obtain the value necessary to compute a truth value for a. 
For example, if the instrument is being placed to record the value of a program 
variable, obs is the name of the variable which, at runtime, is evaluated to give 
the value the variable holds. Such an instrument, which pushes its triple and 
evaluated obs value to a queue to be consumed by the monitoring thread, is 
placed by modifying the Abstract Syntax Tree (AST) of the program. 

Our algorithm for construction of H, is Algorithm1. This makes use of a 
predicate reaches which checks whether one symbolic state is reachable from 
another in the SCFG; and a function lift(a,v) for a € A(y) and v € V which 
gives the symbolic states reachable from v obtained by lifting a to the static 
context. With the tree Hy and binding space B, defined, in the next section we 
present our monitoring approach. 
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3.3 Monitoring for CFTL 


The modified version of the body of f resulting from instrumentation is run 
alongside VYPR’s monitoring algorithm, which consumes data from instruments 
via a consumption queue populated by the main program thread. Monitoring is 
performed asynchronously. VYPR’s monitoring algorithm involves instantiating 
a formula tree (an and-or tree) for each binding in the quantification domain 
of a formula. This algorithm uses the triple (ig, iy, ia) and evaluated obs value 
given by each instrument to perform lookup (to find in which formula trees to 
update the truth value of a specific atom), decide if new formula trees should be 
instantiated and compute the truth value of the atom at index ia in A(y). 

Given a CFTL formula Va € I4,..-,Vdn E In : W(q1,---,Qn), when 
monitoring one can interpret multiple quantification as single quantification 
over a product space I, x -+--+ x In. Such a space contains concrete bindings 
[qı | U1,--+;Gn | Vn] for states or transitions v;. Each of these concrete bind- 
ings generated at runtime corresponds to a single static binding 8 € B,. Using 
this correspondence, we say that each concrete binding has a supporting static 
binding B € By. 

Given that monitoring is performed by instantiating a formula tree for each 
concrete binding in the product quantification domain, the speed of lookup of 
relevant formula trees is greatly increased by grouping them by the indices of 
supporting static bindings (determined by ig). Hence, to either update or instan- 
tiate formula trees, when information is observed from an instrument that helps 
to evaluate 7 at some concrete binding, the supporting static binding must be 
found, giving rise to the requirement for static information during monitoring. 
During monitoring, lookup of which set of formula trees to use is straightforward 
since the index ig is given by the instrument. 

Once lookup has been performed, the result is a set of formula trees corre- 
sponding to the static binding index ig received from the instrument. From here, 
the index ig is used to determine the atom in A(y) whose truth value (computed 
using the value given by obs) must be updated in each formula tree. 


3.4 Verdict Reports 


Once execution has finished, a verdict report is generated, which VYPR, keeps 
in memory. Since each formula tree corresponds to a single concrete binding, 
verdicts share concrete bindings’ correspondence with static bindings. Hence, 
verdicts can be grouped by the supporting static bindings. Given the binding 
space B, computed during instrumentation, a verdict report V from a single run 
of a function can be seen as a partial function 


V: By > QT, L} x Rs)’, 


sending a static binding 8 € B, to a sequence of pairs containing a verdict 
from {T, L} and a timestamp (the time at which the verdict was obtained). 
The map V sends static bindings to sequences of pairs, rather than single pairs, 
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because single static bindings can support multiple concrete bindings, generating 
multiple verdicts. This is the case if, for example, the static binding is inside a 
loop that iterates more than once at runtime. 


4 An Architecture for Web Service Verification 


We begin our description of the architecture of VyPR2, the extension of VYPR 
to web services, by isolating a number of requirements imposed by web ser- 
vice deployment environments, and production software environments in general, 
that must be met. 

The environment at CERN inside which our verification infrastructure must 
function is similar to most production environments. It consists of machines 
for development and production, with each machine automatically pulling the 
relevant tags from a central repository once engineers have pushed their (locally- 
tested) code. Based on this deployment architecture, and the architecture of web 
services, requirements for our Runtime Verification framework include: 


Centralised specifications over multiple functions with multiple properties. It 
should be possible to verify each function in a web service with respect to multi- 
ple properties. Further, specifications for the whole web service should be written 
in a single file, to minimise intrusion into the web service’s code. 


Making instrumentation data persistent. Web services’ code can be pulled from 
a repository onto a production server and, once launched, be restarted multi- 
ple times between successive deployments of different code versions. Therefore, 
instrumentation data must be persistent between processes. 


Persistent verdict data. Similarly, verdict data must be persistent and, further- 
more, engineers must be able to perform offline analysis of the verdicts reached 
by web services at runtime. 

An architecture that meets these requirements is illustrated in Fig.3, and 
described in the following sections. The resulting tool, VyPR2, will soon be 
publicly available from http://cern.ch/vypr. 


4.1 Specifying Multiple Function, Multiple Property Specifications 


For simplicity of use, we have opted to have engineers write their entire specifi- 
cation in a central configuration file, in the root directory of their web service. 
This is a file written in Python, specifying CFTL properties over the service 
using the PyCFTL library. 

Part of such a configuration file, using the PyCFTL specification given in 
Fig. 2, is shown in Fig. 4: one must first give the fully-qualified name of the mod- 
ule in the service in standard Python dot notation and then, for each function, 
the list of properties built up using PyCFTL. 
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Fig. 3. The architecture of VYPR extended to web services. 


Yq € changes(authenticated) : 
Pauth = | Vt € future(q, calls(execute)) : 
q(authenticated) = True => duration(t) € [0,1] 


"app.metadata_handler" : { 
"MetadataHandler.__init__" : [ 
Forall(q = changes (’authenticated’)).\ 
Forall(t = calls(’execute’, after=’q’)).\ 
Check (lambda q, t : ( 
If (q(’ authenticated’) .equals (True) ) .then ( 
teduration() .in([0, 17) 
) 


)) 


Fig. 4. A CFTL specification and its PyCFTL equivalent. 


4.2 Instrumentation 


Given a specification such as that in Fig.4, VyPR’s strategy must be extended 
to the multiple function, multiple property context. Multiple functions are dealt 
with by constructing the SCFG for each function found in the specification and 
performing instrumentation for each property. 

Instrumentation for each property over the same function is performed 
sequentially: VyPR2 instruments using the AST of the input code, and so instru- 
mentation for each property progressively modifies the AST. 

We now describe the modifications required to the actual instruments. In 
VyYPR’s simplified setting, instruments need only send the (ig, iy, ia) triple along 
with the obs value relevant to the atom for which the instrument was placed. 
The multiple function, multiple property setting yields several problems that are 
solved by modifying existing instruments and adding a new kind. 
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In our architecture, monitoring is performed by a single thread, which means 
that this thread must have a way to distinguish between instruments received 
from different functions. We accomplish this by adding the name of the function 
to all instruments added to code. By adding the name of the function to all 
instruments, we deal not only with multiple functions, but with monitored func- 
tions calling other monitored functions, in which case monitor states for multiple 
functions must be maintained at the same time. 

We deal with multiple properties over the same function by adding a unique 
identifier of a property to each of its instruments. We compute a uniquely identi- 
fying string for each property by taking the SHA1 hash of the combination of the 
quantification sequence and the template. We add this unique identifier to each 
instrument, giving the monitoring algorithm a way to distinguish properties. 

Taking the original triple (ig, iy, ia), the appropriate obs code, and the new 
requirements for the function name and the property hash, the new form of 
instruments that are placed by VYPR2 is (function, hash, obs, ig, iy, ia). 


4.3 Making Instrumentation Data Persistent 


The tree H, is dependent on the CFTL formula y for which it has been com- 
puted. Hence, if the specification for a given function in the web service consists 
of a set 6 = {y1,...,n} of CFTL formulas, the data required to monitor each 
property at the same time over the same execution of the given function consists 
of the set of maps H,, which can be identified by y;. In particular, when data is 
received from an instrument by the monitoring algorithm, we can assume from 
Sect. 4.2 that it will contain a unique identifier for the formula for which it was 
placed. Therefore, the correct tree H,, can be determined for each instrument. 
We make such instrumentation data persistent by creating new directories in 
the root of the web service called binding_spaces and instrumentation_maps 
to hold the binding spaces and trees, respectively, computed for each func- 
tion/CFTL property combination. To dump the binding spaces and hierarchy 
functions in files in these directories, we use Python’s pickle [13] module. 


4.4 Activating Verification in a Web Service 


Our infrastructure is designed to minimise intrusion, both by minimising the 
amount of instrumentation performed and by minimising the amount of code 
engineers must add to their services for verification to be performed. 

With the Flask-based implementation of VyPR2 that we present here, one 
can activate verification by adding the lines from vypr import Verification 
and verification = Verification(app) where app is the Flask application 
object required when building a web service with the Flask framework. 

Running verification = Verification(app) will start up the separate 
monitoring thread, similar to VYPR, and will also read the serialised binding 
spaces and trees from the directories described in Sect. 4.3. It will subsequently 
place them in a map G from (module.function, property hash) pairs to objects 
containing the unserialised forms of the binding spaces and trees. 
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4.5 <A Modified Monitoring Algorithm 


VyYPR’s algorithm uses the tuple (ig, iy, ia) with H, to determine the set of 
formula trees to update. In this case, Họ is fixed. However, in the web ser- 
vice setting, the additional information regarding the current function that has 
control and the property to update is present and required to find the correct 
binding space and tree given by G. From here the process is the same as that used 
by VYPR, since the monitoring problem has once again collapsed to monitoring 
a single property over a single function. 


4.6 A Verdict Server 


For a CFTL formula Yqı € I1,.--,Vdn E€ In : U(q1,---;Qn) over a function f, 
we use verdicts to refer to the sequence of truth values in ({T, L} x R7)*, where 
W(q1,---,Qn) generates a truth value in {T, L} for each binding in I x --- x 
I, at a time t € R2. To store such verdicts from a specification written over 
a web service, we now present the most substantial modification to VyPR’s 
architecture: a central server to collect verdicts. This is, in itself, a separate 
system; communication with it takes place via HTTP. It consists of two major 
components: 


— The server, a Python program that provides an API both for verdict inser- 
tion by the monitoring algorithm and for querying by a front-end for verdict 
visualisation. 

— A relational database whose schema is derived from that of the tree Hy. 


We omit further discussion of the server and first state some facts regarding 
our relational schema. Functions and properties are paired, so multiple properties 
over a single function yield multiple pairs; HTTP requests are used to group 
function calls; function calls correspond to function/property pairs; and verdicts 
are organised into bindings belonging to a function/property pair. With these 
facts in mind, one can answer questions such as: 


— “For a given HTTP request, function and property y combination, what were 
the verdicts generated by monitoring ¢ across all calls?” 

— “For a given verdict and subsystem, which function/property pairs generated 
the verdict?” 

— “For a given function call and verdict, which lines were part of bindings that 
generated this verdict while monitoring some property y?” 


5 An Application: The CMS Conditions Uploader 


We now present the details of the application of VyPR2 to the CMS Condi- 
tions Upload Service. We begin by introducing the data with which the CMS 
Conditions Upload Service works. We then give a brief overview of the existing 
performance analysis approaches taken at CERN, before describing our app- 
roach for replaying real data from LHC runs. Finally, we give our specification 
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and present an analysis of the verdicts derived by monitoring the Conditions 
Uploader with input taken from our test data, consisting of in the order of 10* 
inputs recorded during LHC runs. 


5.1 Conditions Data, Their Computation and Upload 


CERN is home to the Large Hadron Collider (LHC) [14], the largest and most 
powerful particle accelerator ever built. At one of the interaction points on the 
LHC beamline lies the Compact Muon Solenoid (CMS) [15], a general purpose 
detector which is a composite of sub-detector systems. Physics analysis at CERN 
requires reconstruction; a process whose input consists of both Event (collisions) 
and Non-Event (alignment and calibrations, or Conditions) data. The lifecycle 
of Conditions data begins with its computation during LHC runs, and ends 
with its upload to a central Conditions database. The service responsible for 
this upload is the CMS Conditions Upload service, a precise understanding of 
the performance of which is vital given planned upgrades to the LHC that will 
increase the amount of data taken. 

The Conditions data used in reconstruction by CMS must define (1) the 
alignment and calibrations constants associated with a particular subdetector 
of CMS and (2) the time (run of the LHC) during which those constants are 
valid. The atomic unit of Conditions is the Payload, which is a serialised C++ 
class whose fields are specific to the subdetector of CMS to which the class 
corresponds. We define when a Payload applies to the subdetector by associating 
with it an Interval of Validity (IOV). We then group IOVs into sequences by 
defining Tags, which define to which subdetector each Payload associated with 
the IOVs it contains applies. 

The CMS Conditions Uploader is used for release of Conditions by the auto- 
mated Conditions computation that takes place at Tier 0 [16] (CERN’s local 
computing grid) and detector experts who require their own Conditions. The 
Uploader is responsible for checking whether the Conditions proposed are valid 
before inserting the Conditions into the central database. 


5.2 A Specification 


We now give the specification with which we tested the Upload service on the 
upload data we collected, along with an interpretation for each property. These 
were written in collaboration with engineers working on the service. 


1. app.usage. Usage.new_upload_session 


Vq € changes(authenticated) : Whenever authenticated is changed, 
Vt € future(q, calls(execute)) : if it is set to True, then all future calls 


q(authenticated) = True to execute should take no more than 
==> duration(t) € [0,1] 1 second. 
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2. app.routes.check_hashes 
Yq € changes(hashes) : duration(next(gq, calls(find_new_hashes))) € [0, 0.3] 
When the variable hashes is assigned, the next call to find_new_hashes 


should take no more than 0.3 seconds. 


3. app.routes.store_blobs 


Every call to the con.execute 


Vt € calls(con.execute) : method on the current database con- 
duration(t) € [0, 2] nection should take no more than 2 
seconds. 


4. app.metadata_handler.MetadataHandler._init__ 


Every time the method insert_iovs 


Vt € calls(i t_iovs) : 
a Ov) is called, the next commit after the 
) ton 


duration ( next(t, 


: insertion should take no more than 1 
calls(commit)) 


second. 


5. app.routes.upload_metadata 


Vt € calls(MetadataHandler) : Every mnie Meradatakandier: 4s 
i instantiated, the instantiation should 
duration(t) € [0, 1] 


take no more than 1 second. 


5.3 Analysis of Verdicts 


We present our analysis of the Conditions uploader with respect to the specifi- 
cation in Sect. 5.2. The analysis is performed in two parts: 


1. Complete Replay - performing a complete upload replay of 14,610 uploads 
collected over a period of 7 months. The time between uploads in this part is 
fixed. 

2. Single Tag Replay - performing a smaller upload replay of ~ 900 uploads 
based on a single Tag. This part is a subset of the first, but where the time 
between uploads is varied. 


Complete Replay. Figure 5 shows the results of monitoring our specification over 
a dataset of 14,610 uploads. The x axis is function/property pair IDs from the 
verdict database snapshot used to generate the plot. The ID to property corre- 
spondence is such that ID 99 refers to property 1; ID 100 to property 2; ID 101 
to property 3; ID 102 to property 4; and ID 103 to property 5. Clearly, from 
this plot, the violations of property 2 exceed those caused by other properties by 
an order of magnitude. The check_hashes function carries out an optimisation 
that we call hash checking, used to make sure that a Conditions upload only 
sends the Payloads that are not already in the target Conditions database. This 
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the 900 upload dataset. (Color figure 
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is possible because Payloads are uniquely identifiable by their hashes. This opti- 
misation reduces the time spent on Payload uploads by an order of magnitude 
[12], but the frequency of violation in Fig. 5 suggests that the optimisation itself 
may be causing unacceptable latency. 


Single Tag Replay. Figure6 shows the results of monitoring a subset of our 
specification over a dataset of ~ 900 uploads from a single Tag in the Conditions 
database. In this case, the x axis is runs of this upload dataset performed with 
varying delays between uploads, and the y axis is the number of violations based 
on a specification with 3 properties. This plot is of interest because, for the ~ 300 
Payloads inserted during this replay, it shows that the latency experienced by 
those insertions (in terms of violations of property 3, shown in orange) decreases 
as the delay between uploads increases. 


5.4 Resulting Investigation 


Based on the observations presented in Sect. 5.3, we have made investigation 
of the number of violations caused by hash checking a priority. It is recognised 
that this process is required, and its addition to the Conditions Uploader was a 
significant optimisation, but the optimisation can only be considered as such if 
it does not introduce unacceptable overhead to the upload process. 

It is also clear that we should understand the pattern of violations in Fig. 6 
more precisely. Given that the Conditions Uploader must operate successfully 
with both the current and upgraded LHC, it is a priority to understand the 
behaviour of the Uploader under varying frequencies of uploads. We suspect 
that investigation into the pattern seen in Fig.6 will result in modification of 
either the Conditions Uploader’s code, or the way in which Conditions are sent 
for upload during LHC runs. 
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5.5 Performance 


We now describe the time and space overhead induced by using VYPR2 to 
monitor the specification in Sect. 5.2 over the Conditions Uploader. We consider 
both the time overhead on a single upload, and the space required to store 
intermediate instrumentation data. 

To measure the time overhead induced over a single upload, we found that 
measuring overhead by running our complete upload dataset in a small period 
of time resulted in erratic database latency (the dataset was recorded over 7 
months), so we opted to run a single upload 10 times with and without mon- 
itoring. This provided a more realistic upload scenario, and allowed us to see 
the overhead induced with respect to a single upload process (the process varies 
depending on the Conditions being uploaded). The result, from 10 runs of the 
same upload, was an average time overhead of 4.7%. Uploads are performed by a 
client sending the Conditions to the upload server over multiple HTTP requests, 
so this overhead is measured starting from when the first request is received by 
the upload server to when the last response is sent. 

The space required to store all of the necessary instrumentation data for the 
specification in Sect. 5.2 is divided into space for binding spaces (B,), instrumen- 
tation maps (H,) and indices (a map from property hashes to the position in the 
specification at which they are found). The binding spaces took up 170 KB, the 
instrumentation maps 173 KB and the index map 4.3 KB, giving a total space 
overhead for instrumentation data storage of 347.3 KB. 


6 Related Work 


To the best of our knowledge, there is no existing work on Runtime Verification 
of web services. We are also unaware of other (available and maintained) RV 
tools for Python (there is Nagini [17], but this focuses on static verification) as 
most either operate offline (on log files) or focus on other languages such as Java 
[5,7,18] using AspectJ for instrumentation, C [19], or Erlang [20]. Few RV tools 
consider the instrumentation problem within the tool. The main exception is 
Java-MaC [3] who also use the specification to rewrite the Java code directly. 


High-Energy Physics. In High Energy Physics, any form of monitoring concen- 
trates on instrumentation in order to carry out manual inspection. For exam- 
ple, the instrumentation and subsequent monitoring of CMS’ PHEDEX system 
for transfer of physics data was performed [21] and resulted in the identifica- 
tion of areas in which latency could be improved. Closer to the case study we 
present here, CMS uses the PCLMON tool to monitor Conditions computation 
[22]. Finally, the Frontier query caching system performs offline monitoring by 
analysing logs [23]. None of these approaches uses a formal specification lan- 
guage, and they all collect a single type of statistics for a single defined use case. 
On the contrary, VYPR2 is configurable in the sense that one can change the 
specification being checked using our formal specification language, CFTL. 
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7 Conclusion 


We have introduced the VyPR tool for monitoring single-threaded Python pro- 
grams with respect to CFTL specifications, expressed using the PyCFTL library 
for Python. We then highlighted the problems that one must solve to extend 
VyYPR’s architecture to the web service setting, and presented the VyPR2 
framework which implements our solutions. VyPR2 is a complete Runtime Ver- 
ification framework for Flask-based web services written in Python; it provides 
the PyCFTL library for writing CFTL specifications over an entire web service, 
automatic minimal (with respect to reachability) instrumentation and efficient 
monitoring. Finally, we have described our experience using VYPR2 to anal- 
yse performance of the CMS Conditions Uploader, a critical part of the physics 
reconstruction pipeline of the CMS Experiment at CERN. 

With the large amount of test data we have at CERN, we plan to extend 
VyPR2 to address explanation of violations of any part of a specification. This 
has been agreed within the CMS Experiment as being a significant step in devel- 
oping the necessary software analysis tools ready for the upgraded LHC. 
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Abstract. Verifying hyperproperties at runtime is a challenging prob- 
lem as hyperproperties, such as non-interference and observational deter- 
minism, relate multiple computation traces with each other. It is neces- 
sary to store previously seen traces, because every new incoming trace 
needs to be compatible with every run of the system observed so far. Fur- 
thermore, the new incoming trace poses requirements on future traces. In 
our monitoring approach, we focus on those requirements by rewriting a 
hyperproperty in the temporal logic HyperLTL to a Boolean constraint 
system. A hyperproperty is then violated by multiple runs of the system if 
the constraint system becomes unsatisfiable. We compare our implemen- 
tation, which utilizes either BDDs or a SAT solver to store and evaluate 
constraints, to the automata-based monitoring tool RVHyper. 


Keywords: Monitoring - Rewriting - Constraint-based - 
Hyperproperties 


1 Introduction 


As today’s complex and large-scale systems are usually far beyond the scope 
of classic verification techniques like model checking or theorem proving, we 
are in the need of light-weight monitors for controlling the flow of information. 
By instrumenting efficient monitoring techniques in such systems that oper- 
ate in an unpredictable privacy-critical environment, countermeasures will be 
enacted before irreparable information leaks happen. Information-flow policies, 
however, cannot be monitored with standard runtime verification techniques 
as they relate multiple runs of a system. For example, observational deter- 
minism [19,21,24] is a policy stating that altering non-observable input has 
no impact on the observable behavior. Hyperproperties [7] are a generalization 
of trace properties and are thus capable of expressing information-flow poli- 
cies. HyperLTL [6] is a recently introduced temporal logic for hyperproperties, 
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which extends Linear-time Temporal Logic (LTL) [20] with trace variables and 
explicit trace quantification. Observational determinism is expressed as the for- 
mula Yr, 7’. (out,  out,) Wing ~ ing’), stating that all traces 7,7’ should 
agree on the output as long as they agree on the inputs. 

In contrast to classic trace property monitoring, where a single run suffices 
to determine a violation, in runtime verification of HyperLTL formulas, we are 
concerned whether a set of runs through a system violates a given specifica- 
tion. In the common setting, those runs are given sequentially to the runtime 
monitor [1,2,12,13], which determines if the given set of runs violates the specifi- 
cation. An alternative view on HyperLTL monitoring is that every new incoming 
trace poses requirements on future traces. For example, the event {in, out} in 
the observational determinism example above asserts that for every other trace, 
the output out has to be enabled if in is enabled. Approaches based on static 
automata constructions [1,12,13] perform very well on this type of specifica- 
tions, although their scalability is intrinsically limited by certain parameters: 
The automaton construction becomes a bottleneck for more complex specifica- 
tions, especially with respect to the number of atomic propositions. Furthermore, 
the computational workload grows steadily with the number of incoming traces, 
as every trace seen so far has to be checked against every new trace. Even opti- 
mizations [12], which minimize the amount of traces that must be stored, turn 
out to be too coarse grained as the following example shows. Consider the moni- 
toring of the HyperLTL formula Yr, 7’.O (a, — —b,’), which states that globally 
if a occurs on any trace 7, then b is not allowed to hold on any trace 7’, on the 
following incoming traces: 


{a} | {} | {} {} =b is enforced on the 1st pos. (1) 


| 
{a} | {a} | {} | {} ab is enforced on the 1st and 2nd pos. (2) 
{a} | o [| {a | 


In prior work [12], we observed that traces, which pose less requirements 
on future traces, can safely be discarded from the monitoring process. In the 
example above, the requirements of trace 1 are dominated by the requirements 
of trace 2, namely that b is not allowed to hold on the first and second position of 
new incoming traces. Hence, trace 1 must not longer be stored in order to detect a 
violation. But with the proposed language inclusion check in [12], neither trace 2 
nor trace3 can be discarded, as they pose incomparable requirements. They 
have, however, overlapping constraints, that is, they both enforce =b in the first 
step. 

To further improve the conciseness of the stored traces information, we use 
rewriting, which is a more fine-grained monitoring approach. The basic idea is 
to track the requirements that future traces have to fulfill, instead of storing 
a set of traces. In the example above, we would track the requirement that 
b is not allowed to hold on the first three positions of every freshly incoming 
trace. Rewriting has been applied successfully to trace properties, namely LTL 


{} ab is enforced on the 1st and 3rd pos. (3) 
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formulas [17]. The idea is to partially evaluate a given LTL specification y on an 
incoming event by unrolling y according to the expansion laws of the temporal 
operators. The result of a single rewrite is again an LTL formula representing 
the updated specification, which the continuing execution has to satisfy. We use 
rewriting techniques to reduce V?HyperLTL formulas to LTL constraints and 
check those constraints for inconsistencies corresponding to violations. 

In this paper, we introduce a complete and provably correct rewriting- 
based monitoring approach for V?7HyperLTL formulas. Our algorithm rewrites 
a HyperLTL formula and a single event into a constraint composed of plain 
LTL and HyperLTL. For example, assume the event {in, out} while monitor- 
ing observational determinism formalized above. The first step of the rewrit- 
ing applies the expansion laws for the temporal operators, which results in 
(ing e ing) V (out; > outy) A O((out, > outy) Wing + ing’)). The event 
{in, out} is rewritten for atomic propositions indexed by the trace variable r. 
This means replacing each occurrence of in or out in the current expansion step, 
i.e., before the O operator, with T. Additionally, we strip the z’ trace quantifier 
in the current expansion step from all other atomic propositions. This leaves us 
with (T in) V (T e out) A O((out, © outs) Wing  ingr)). After sim- 
plification we have ~in V out A O((out, © outa) Wing  inqr)) as the new 
specification, which consists of a plain LTL part and a HyperLTL part. Based 
on this, we incrementally build a Boolean constraint system: we start by encod- 
ing the constraints corresponding to the LTL part and encode the HyperLTL 
part as variables. Those variables will then be incrementally defined when more 
elements of the trace become available. With this approach, we solely store the 
necessary information needed to detect violations of a given hyperproperty. 

We evaluate two implementations of our approach, based on BDDs and SAT- 
solving, against RVHyper [13], a highly optimized automaton-based monitoring 
tool for temporal hyperproperties. Our experiments show that the rewriting 
approach performs equally well in general and better on a class of formulas 
which we call guarded invariants, i.e., formulas that define a certain invariant 
relation between two traces. 


Related Work. With the need to express temporal hyperproperties in a suc- 
cinct and formal manner, the above mentioned temporal logics HyperLTL and 
HyperCTL* [6] have been proposed. The model-checking [6, 14, 15], satisfiability 
[9], and realizability problem [10] of HyperLTL has been studied before. 

Runtime verification of HyperLTL formulas was first considered for (co-)k- 
safety hyperproperties [1]. In the same paper, the notion of monitorability for 
HyperLTL was introduced. The authors have also identified syntactic classes 
of HyperLTL formulas that are monitorable and they proposed a monitoring 
algorithm based on a progression logic expressing trace interdependencies and 
the composition of an LTL3 monitor. 

Another automata-based approach for monitoring HyperLTL formulas was 
proposed in [12]. Given a HyperLTL specification, the algorithm starts by cre- 
ating a deterministic monitor automaton. For every incoming trace it is then 
checked that all combinations with the already seen traces are accepted by 
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the automaton. In order to minimize the number of stored traces, a language- 
inclusion-based algorithm is proposed, which allows to prune traces with redun- 
dant information. Furthermore, a method to reduce the number of combination 
of traces which have to get checked by analyzing the specification for relations 
such as reflexivity, symmetry, and transitivity with a HyperLTL-SAT solver 
[9,11], is proposed. The algorithm is implemented in the tool RVHyper [13], 
which was used to monitor information-flow policies and to detect spurious 
dependencies in hardware designs. 

Another rewriting-based monitoring approach for HyperLTL is outlined in 
[5]. The idea is to identify a set of propositions of interest and aggregate con- 
straints such that inconsistencies in the constraints indicate a violation of the 
HyperLTL formula. While the paper describes the building blocks for such a 
monitoring approach with a number of examples, we have, unfortunately, not 
been successful in applying the algorithm to other hyperproperties of interest, 
such as observational determinism. 

In [3], the authors study the complexity of monitoring hyperproperties. They 
show that the form and size of the input, as well as the formula have a sig- 
nificant impact on the feasibility of the monitoring process. They differentiate 
between several input forms and study their complexity: a set of linear traces, 
tree-shaped Kripke structures, and acyclic Kripke structures. For acyclic struc- 
tures and alternation-free HyperLTL formulas, the problems complexity gets as 
low as NC. 

In [4], the authors discuss examples where static analysis can be combined 
with runtime verification techniques to monitor HyperLTL formulas beyond the 
alternation-free fragment. They discuss the challenges in monitoring formulas 
beyond this fragment and lay the foundations towards a general method. 


2 Preliminaries 


Let AP be a finite set of atomic propositions and let X = 24? be the correspond- 
ing alphabet. An infinite trace t € X® is an infinite sequence over the alphabet. 
A subset T C X“ is called a trace property. A hyperproperty H C 2") is a 
generalization of a trace property. A finite trace t € X+ is a finite sequence 
over X. In the case of finite traces, |t| denotes the length of a trace. We use the 
following notation to access and manipulate traces: Let t be a trace and 7 be a 
natural number. t[i] denotes the i-th element of t. Therefore, t[0] represents the 
first element of the trace. Let j be natural number. If j > i and i > |t|, then 
ti, j] denotes the sequence t/iJt[i + 1] ---t[min(j, |t| — 1)]. Otherwise it denotes 
the empty trace e. t[i) denotes the suffix of t starting at position i. For two finite 
traces s and t, we denote their concatenation by s - t. 


HyperLTL Syntax. HyperLTL [6] extends LTL with trace variables and trace 
quantifiers. Let V be a finite set of trace variables. The syntax of HyperLTL is 
given by the grammar 
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pi= Vr. o | Ir. y| 4Y 
p= ar| yry] Oyl yU y, 


where a € AP is an atomic proposition and 7 € V is a trace variable. Atomic 
propositions are indexed by trace variables. The explicit trace quantification 
enables us to express properties like “on all traces y must hold”, expressed by 
Yr. p. Dually, we can express “there exists a trace such that y holds”, expressed 
by 3r. y. We use the standard derived operators release pRw = 7(>~lU 7), 
eventually Oy := true U p, globally Oy = =~ O ~y, and weak until pı W p2 = 
(p1 U p2) VOy1. As we use the finite trace semantics, Oy denotes the strong 
version of the next operator, i.e., if a trace ends before the satisfaction of y 
can be determined, the satisfaction relation, defined below, evaluates to false. 
To enable duality in the finite trace setting, we additionally use the weak next 
operator Oy which evaluates to true if a trace ends before the satisfaction of y 
can be determined and is defined as Oy := ~O-y. We call y of a HyperLTL 
formula Q., with an arbitrary quantifier prefix Q, the body of the formula. A 
HyperLTL formula Q.~ is in the alternation-free fragment if either Q consists 
solely of universal quantifiers or solely of existential quantifiers. We also denote 
the respective alternation-free fragments as the V” fragment and the 3” fragment, 
with n being the number of quantifiers in the prefix. 


Finite Trace Semantics. We recap the finite trace semantics for HyperLTL [5] 
which is itself based on the finite trace semantics of LTL [18]. In the following, 
when using £(y) we refer to the finite trace semantics of a HyperLTL formula 
p. Let [fin : V > X be a partial function mapping trace variables to finite 
traces. We define e[0] as the empty set. [,,[¢) denotes the trace assignment that 
is equal to I f,(7)[i) for all r € dom(Hfn). By slight abuse of notation, we write 
t € [fn to access traces t in the image of Ign. The satisfaction of a HyperLTL 
formula y over a finite trace assignment Tfn and a set of finite traces T, denoted 
by Ifin Fr y, is defined as follows: 


IT fin FT Ar if a € Hin (7) [0] 

IT fin KT 1p if Ln Fr p 

IT fin Fr pV w if pn Fr p or IT fin Ep w 

Ijin Fr Ov if Vt € fin. |t| > 1 and I ge{1) Fr Yp 

fn -rpuw ifdi< MINE Min t|. I yfin[t) Er YAVYj < i. Mfnlj) Er yp 
Ign Fr dn.p if there is some t € T such that Hanfe + t] Er y 

Ign Fr Yr. if for all t € T such that Hfan|r > t] Er p 


Due to duality of U / R, O /Õ, I/V, and the standard Boolean operators, every 
HyperLTL formula y can be transformed into negation normal form (NNF), i.e., 
for every y there is some w in negation normal form such that for all 77, and T 
it holds that Ug, Fr y if, and only if, Hgin Fr Y. The standard LTL semantic, 
written t Furi, Y, for some LTL formula ¢ is equal to {r + t}yin Fø Y’, where 
y’ is derived from y by replacing every proposition p € AP by pr. 
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3 Rewriting HyperLTL 


Given the body y of a V?HyperLTL formula Vr, 7’. y, and a finite trace t € XF, 
we define alternative language characterizations. These capture the intuitive idea 
that, if one fixes a finite trace t, the language of Vz, n’. p includes exactly those 
traces t that satisfy y in conjunction with t. 


Lily) =e S*| {rot othe, Fy 

Lig) =e St {mot the, FE 

Lily) = LE lp) N LE (p) 
We call 6 := y A yfr'/n, n/n] the symmetric closure of y, where y[z’/7, 7/7] 
represents the expression y in which the trace variables 7,7’ are swapped. The 


language of the symmetric closure, when fixing one trace variable, is equivalent 
to the language of y. 


Il 


Lemma 1. Given the body y of a Y? HyperLTL formula Yr, n'.p, and a finite 
trace t € X+, it holds that LT ($) = Lile). 


Proof. 
LI(¢) = T EL *| {r = t, T = t} gin F a} 
= UES | {nm t, m bin EPA ola! /n,2/n']\ 
te | {r t,t e t} in FP iTe t,t outa, F la! /r,x/n'] 
HUE |{rot wot}, FTH tT = tha, F o} = Lilo) 


II 


We exploit this to rewrite a V?7HyperLTL formula into an LTL formula. We define 
the projection y|7 of the body y of a V?HyperLTL formula Yr, 7’. y in NNF and 
a finite trace t € X+ to an LTL formula recursively on the structure of y: 


bs T ifaet(0 z T ifa¢ tio] 
arli = ; marli = . 
| otherwise | otherwise 
Ant | =a ~ax| =a 
(eV D)e = elz V¥lE (eny) = plg AYE 
Sra pt EL if lt] <1 
(Ovi = O Plin otherwise 
ae T if t| <1 
(Ovi E Õ elin otherwise 
„fo i< 
(PUPE = 4) n A , 
VE V (olg AOU Y) )) otherwise 
7 Y| if e| <1 
(PRY) pom : 


VIE A (GIF VO(eRY)[fu)) otherwise 
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Theorem 1. Given a Y? HyperLTL formula Vr,7'.y and any two finite traces 
t,t € Ut it holds that t € LI (p) if, and only if t Eiri», lF- 


Proof. By induction on the size of t. Induction Base (t = e, where e € X): 
Let t € Xt be arbitrarily chosen. We distinguish by structural induction the 
following cases over the formula y. We begin with the base cases. 


— a,: we know by definition that a,|7 equals T if a € t[0] and L otherwise, so 
it follows that t Firi;, arle © a € t[0] > t € Li (ar). 

— ar: Y ELF lar) S a E t [0] ot Fr, a S t ELT Lan ar li- 

— ~ar and 7a,’ are proven analogously. 


The structural induction hypothesis states that Vt!’ € Xt.” € LF) & 
t ELTLen Ulf (SIH1), where 7 is a strict subformula of ¢. 


-yV t E Liyvd) S (E E Ley) E E LI) = (E Era 


glf) V (E Fire, VIE) @ t Frei, (P V E- 
= Oy: A E LI(Oy) EEr HE pao t. FLT Lin (Oy)lt. 
- ply: Y e LI(puy) S$ v e cr) Bev ein, vit Sr an, 


(pu). 
- y ^Y, Ov and yRvw are proven analogously. 


Induction Step (t = e-t*, where e € X,t* € X+): The induction hypothesis states 
that Vt’ € Ut.t! € L(y) & t Exri,, ple (IH). We make use of structural 
induction over y. All cases without temporal operators are covered as their 
proofs above were independent of |t|. The structural induction hypothesis states 
for all strict subformulas 7 that Vt’ € XF. t € LI (Y) S t Fieri, WIF (STH2). 


JH 
- Oy: tie Li (Oy) aes e[l € LE (y) S t’[1) FLT Lin ylt ot FLT Lin 


t*=e-t 
Oleg) = t Fitts oa E 


- plu: t E LE(puYy) 123 E E LE(WW)) VU E LEl) A (TL) € 
LE (pU )) SS (P Furie VIE) V E E GIR) A CIL) Errea (PUD S 
(E ELTLan Vl) V (E E GBA Furi, OCU PIF) = t ELT Lin (GU WIE 


Tai 
- Oy and ORI are proven analogously. 


4 Constraint-Based Monitoring 


For monitoring, we need to define an incremental rewriting that accurately mod- 
els the semantics of y|7 while still being able to detect violations early. To this 
end, we define an operation y|7,e, 7], where e € X is an event and i is the cur- 
rent position in the trace. p|r, e,i] transforms y into a propositional formula, 
where the variables a either indexed atomic propositions p; for p € AP, or a 
variable v7, ‘ol itd and UD iti that act as piecholders until new information about 
the trace comes in. Whenever the next event e’ occurs, the variables are defined 
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with the result of y’[z,e’,i + 1]. If the trace ends, the variables are set to true 
and false for vt and v”, respectively. We define |r, e,i] of a V7HyperLTL for- 
mula Yr, n’. in NNF, event e € X, and i > 0 recursively on the structure of 
the body y: 


a iface f ifag¢e 
ar|r, e,i] = . (7a,)[7, e,i] := . 
L otherwise | otherwise 
Aq’ |, €, 2 = ai (“aw )[T, e,i] := 74; 
(eV Y)[T, e,i] = pfr, e,i] V Yir, e,i] (PA Y)I[n, e,i] = pfr, e,i] A yir, e,i] 
(O p)ir, e, i] = Up itl (O p)ir, e, i] = Us iga 


(yu w)[r, €, i] = plr, e, i] V (yfr, e, i] ^ vpu ere) 
(ypRw)|[r, €, i] = plr, e, i] ^ (yfr, e, i] V Urea) 


We encode a V?HyperLTL formula and finite traces into a constraint system, 
which, as we will show, is satisfiable if and only if the given traces satisfy the 
formula w.r.t. the finite semantics of HyperLTL. We write vy; to denote either 
Vg, i OF vhi: For e € X and t € X*, we define 


.) =T 


constr(vt ; 


yt 

constr(u; i€) = 4 

constr (Vvo i,e- t) := pfr, e,i] A \ (vpi > constr(vy,c41,t)) 
l Vy, i+1Ep[T,e,i] 

ench p(€) S 

ench ple- t) = Aa A N ra ^A enti (2), 


acAPNe acAP\e 


where we use vy ;41 € y[7, e, i] to denote variables vy ;1 occurring in the propo- 
sitional formula y|[7,e, 7]. enc is used to transform a trace into a propositional 
formula, e.g., enc?, » ({a}{a, b}) = ao A =bo A ai A by. For n = 0 we omit the 
annotation, i.e., we write encap(t) instead of enc) p(t). Also we omit the index 
AP if it is clear from the context. By slight abuse of notation, we use constr” (y, t) 
for some quantifier free HyperLTL formula y to denote constr(vyn,t) if |t| > 0. 
For a trace t € XT, we use the notation enc(t’) E constr(y,t), which evaluates 
to true if, and only if enc(t’) A constr(y,t) is satisfiable. 


4.1 Algorithm 


Figure 1 depicts our constraint-based algorithm. Note that this algorithm can 
be used in an offline and online fashion. Before we give algorithmic details, 
consider again, the observational determinism example from the introduction, 
which is expressed as V?HyperLTL formula Yr, m’. (out, > out,)W(ing + 
inw). The basic idea of the algorithm is to transform the HyperLTL formula 
to a formula consisting partially of LTL, which expresses the requirements of 
the incoming trace in the current step, and partially of HyperLTL. Assuming 
the event {in, out}, we transform the observational determinism formula to the 
following formula: ~in V out A O((out, © outy) Wing  ing)). 
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Input : Yr, r.p, TC Zt A Boolean constraint system is 
then build incrementally: we start 


Output: violation or no violation ; : 
encoding the constraints correspond- 


1 y= nnf(¥) ing to the LTL part (in front of the 
2C:=T next-operator) and encode the Hyper- 
3 foreach t € T do LTL part (after the next-operator) 
i Ce = vyo as variables that are defined when 
5 tenc = T more events of the trace come in. 
S while e; = getNestEvent(t) do We continue by explaining the algo- 
f tenc ‘= tenc ^ enc’ (ei) rithm in detail. In line 1, we construct 
3 foreach vg, € C; do w as the negation normal form of 
a c= elm, ei, i] the symmetric closure of the origi- 
a | Cr = Ci A Woi > ©) nal formula. We build two constraint 
11 if ssat(C A^ C; A^ tenc) then systems: C containing constraints of 
12 return violation previous traces and C, (built incre- 


a P mentally) containing the constraints 
13 foreach Via E Cı do 


for the current trace t. Consequently, 
14 | | Ci = Ot N vp iy we initialize C with T and C, with 
15 foreach v3 ;,, € C; do vy o (lines2 and 4). If the trace ends, 
i8 f Cr = Ct A Waya we define the remaining v variables 


according to their polarities and add 
17 | C= CAC: C; to C. For each new event e; in 
18 return no violation the trace t, and each “open” con- 
straint in C, corresponding to step i, 
Fig. 1. Constraint-based algorithm for i.e., vg; € Cr, we rewrite the for- 
monitoring V?7HyperLTL formulas. mula @ (line9) and define vg; with 
the rewriting result, which, potentially 
introduced new open constraints vg: ;+1 for the next step i + 1. The constraint 
encoding of the current trace is aggregated in constraint tene (line7). If the 
constraint system given the encoding of the current trace turns out to be unsat- 
isfiable, a violation to the specification is detected, which is then returned. 

In the following, we sketch two algorithmic improvements. First, instead of 
storing the constraints corresponding to traces individually, we use a new data 
structure, which is a tree maintaining nodes of formulas, their corresponding 
variables and also child nodes. Such a node corresponds to already seen rewrites. 
The initial node captures the (transformed) specification (similar to line 4) and 
it is also the root of the tree structure, representing all the generated constraints 
which replaces C in Fig. 1. Whenever a trace deviates in its rewrite result a 
new child or branch is added to the tree. If a rewrite result is already present 
in the node tree structure there is no need to create any new constraints nor 
new variables. This is crucial in case we observe many equal traces or traces 
behaving effectively the same. In case no new constraints were added to the 
constraint system, we omit a superfluous check for satisfiability. 

Second, we use conjunct splitting to utilize the node tree optimization even 
more. We illustrate the basic idea on an example. Consider Vz, n’. p with y = 
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((ar > a} )V(br > bL)), which demands that on all executions on each position 
at least on of propositions a or b agree in its evaluation. Consider the two traces 
tı = {a}{a}{a}, to = {a}{a, b}{a} that satisfy the specification. As both traces 
feature the same first event, they also share the same rewrite result for the first 
position. Interestingly, on the second position, we get (a V =b) A sọ for tı and 
(a V b) A Sọ for t2 as the rewrite results. While these constraints are no longer 
equal, by the nature of invariants, both feature the same subterm on the right 
hand side of the conjunction. We split the resulting constraint on its syntactic 
structure, such that we would no longer have to introduce a branch in the tree. 


4.2 Correctness 


In this technical subsection, we will formally prove correctness of our algorithm 
by showing that our incremental construction of the Boolean constraints is equi- 
satisfiable to the HyperLTL rewriting presented in Sect.3. We begin by showing 
that satisfiability is preserved when shifting the indices, as stated by the follow- 
ing lemma. 


Lemma 2. For any V?HyperLTL formula Yr, n'.p over atomic propositions 
AP, any finite traces t,t!’ € X* and n > 0 it holds that encap(t') E 
constr(y,t) <= enc’ p(t’) F constr” (y, t). 


Proof. By renaming of the positional indices. 


In the following lemma and corollary, we show that the semantics of the next 
operators matches the finite LTL semantics. 


Lemma 3. For any? HyperLTL formula Vr, n’. p over atomic propositions AP 
and any finite traces t,t’ € X it holds that enc(t’) F constr(Oy,t) = enc(t’) E 
constr(u, 1, t[1)) = enc(t'[1)) F constr(v, o, t[1)). 


Proof. Let p, t,t’ be given. It holds that constr(Oy,t) = constr(vzj,t{1)) 
by definition. As constr(v, ,,t[1)) by construction does not contain any vari- 
ables with positional index 0, we only need to check satisfiability with respect 
to enc(t’[1)). Thus enc(t’) E constr(Oy,t) & enc(t’) F constr(vy ,,t[1)) = 


enc'(t'[1)) E constr (v 1, ¢[1)) omg enc(t’[1)) E constr (v; o, t[1)). 


Corollary 1. For any V?HyperLTL formula Vr,7'. over atomic propositions 
AP and any finite traces t,t! € Xt it holds that enc(t') E constr(Oy,t) & 


enc(t’) F constr (V% 1, t[1)) = enc(t'[1)) E constr (us o, t[1)). 


We will now state the correctness theorem, namely that our algorithm preserves 
the HyperLTL rewriting semantics. 


Theorem 2. For every V? HyperLTL formula Vr, n.p in negation normal form 
over atomic propositions AP and any finite trace t € XY it holds that Vt' € 
St.t Eprom P;  encap(t’) E constr(y,t). 
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Proof. By induction over the size of t. Induction Base (t = e, where e € X): We 
choose t € X+ arbitrarily. We distinguish by structural induction the following 
cases over the formula g: 


— ar: constr(az,e) = (a,)[7,e,0] = T if, and only if, a € e. Thus enc(t’) E 
constr(az,e) & a E€ e St’ Fri, arle. 
— axi: constr(ag,e) = (ar )[Tt,e,0] = ao Thus enc(t’) F constr(ar,e) © 


def def 
enc(t’) F ao ——S a € t [0 et FLTLjin @ paste t FLTL jin On! 2 


— ma, and nay are proven analogously. 


The structural induction hypothesis states that Vi! € X+. t Futbin VIE = 
enc(t’) E constr(a,t) (SIH1), where w is a strict subformula of y. 


= PV: t Furs, (PV BIE & (P Furng, IZ) V (E Pures, VIE) > (ene(t!) F 
constr(y,e)) V (enc(t’) E constr(w,e)) = (enc(t’) E yl, e,0]) v (enc(t’) E 
wr, e, 0]) = enc(t’) E yfr, e, 0] v y[r, e, 0] gien enc(t’) F (y V w)[7, e, 0] = 
enc(t’) E constr(y V w, e). 

- Oy: constr(Ov,e) = (Ov)[t,e,0] = vo ^ (vzo > constr(uyo,€)) = L. 
Thus t’ Furi, (Op) = 1  enc(t’) F L. 

~- pUy: constr(pUp,e) = (pUy)[7,e,0] = yir,e,0] v (yl[m,e,0] A 
constr(vsyyo)) = ir,e,0) = constr(p,e). Thus t  FLTLp 


= 7 SIHI n 
(pUY)|E FLTLa Vle E enc(t') E constr(y, e). 
- y Ny, Oy, and yR y are proven analogously. 


Induction Step (t = e-t*, where e € X and t* € X*): The induction hypothesis 
states that Vt! € XF. t ELTLa Ple = enc(t’) E constr(p,t*) (IH). We make use 
of structural induction over y. All base cases are covered as their proofs above 
are independent of |t|. The structural induction hypothesis states for all strict 
subformulas y that Vt! € Xt. t Frrig, VI = enc(t’) E constr(a, t). 


— pV y: 
E FLTLe OVD) SU Fionn OE VU Furie VIE 
Em, enc(t') E constr(y,t) V enc(t') E constr (a, t) 
pN enc(t’) F (yfr, e, 0] A \ vg 1 constr(vy'1,t*)) 
vo 1€y|T,€,0 
V enc(t’) E (y[r, e, 0] A A Vyp 1 —> constr(vy1,t*)) 
Vy 1Epl7,e,0] 
& enc(t') E (pfr, e,0] V %lr, e, 0) 
^ A Uy, — constr(Ugr1, t*) 
vor 1€¢l7,€,0] 
A \ vy 1 constr(vy1,t*) 
vyr 1 Ep[r,e,0] 
= — enc(t’) F (pV y)ir,e,0] 
A A U¢,1 > constr(v¢1,t*) 
ve,1€(~V¥)[7,€,0] 


4L enc(t') E constr(y V y, t) 


`~ 
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t: <=: trivial, >: Assume a model Mọ for enc(t’) E y[m, e, 0] A A. By construc- 
tion, constraints by y do not share variable with constraints by ~. We extend 
the model by assigning vy; with L, for all vy 1 E ~[7,e,0] and assigning 
the rest of the variables in w[z, e, 0] arbitrarily. 

IH 


- Oy: U Fitts, (OPE & t Fro, Ople = U[l) Frit, vl = 
enc(t’[1)) F constr(y, t*) oh, enc(t’) E constr(Oy, t). 
= Uy: ; 
t Fire, (PU YE 
e Freie, VIE V [E Firu oE A PE) Errea (PUDE | 
EEL enc(t’) E constr(y, t) 
V [enc(t') E constr(y,t) A enc(t') E constr (voy y pt) 
< enc(t’) E (|r, e, 0] A VAN vyr — constr(vyr1,t*)) 
Vy 1 EP[T,e,0] 
enc(t') E (pfr, e, 0] A VAN vg > constr(vg 1, t*)) 
V vo 1€¢|7,€,0] 
A enc(t') E (suds Youd constr(vsyyart )) 


same as į enc(t’) E (yfr, e, 0] V (plr, €,0] Avuy) 


A VAN vyp 1 —> constr(vy,1,t") 
Vy 1 EY[T,e,0] 
A VAN vg 1 constr(vgr1,t") 


vg 1€¢l7,€,0] 
— = * 
A Voudb.1 constr(vsy wart ) 


< enc(t') F pU yir, e, 0] 
^ VAN vg,1 —> constr(vg,1, t”) 
vg, 1EpU p[r,e,0] 
< enc(t’) E constr(pU y, t) 


- g AY, Oy, and y Ryp are proven analogously. 


Corollary 2. For any V? HyperLTL formula Yr, x’. in negation normal form 
over atomic propositions AP and any finite traces t,t! € X+ it holds that t € 
Lile) > encaplt') E constr (ĝ, t). 


Proof. t € Lile) E y FLTLyin AlE os, enc(t’) E constr(¢,t). 


Lemma 4. For any V?HyperLTL formula Yr, n'.p in negation normal form 
over atomic propositions AP and any finite traces t,t! € Xt it holds that 
enc, p(t’) ¥ constr(y,t) > Vt" € D.t <t” > encap(t”) É constr(y,t). 


Proof. We proof this via contradiction. We choose t,t’ as well as y arbitrarily, 
but in a way such that enc(t’) ¥ constr(y,t) holds. Assume that there exists 
a continuation of t’, that we call t”, for which enc(t”) E constr(y,t) holds. So 
there has to exist a model assigning truth values to the variables in constr(y, t), 
such that the constraint system is consistent. From this model we extract all 
assigned truths values for positional variables for position |t’| to |t’”| — 1. As t 
is a prefix of t”, we can use these truth values to construct a valid model for 
enc(t’) E constr(y, t), which is a contradiction. 
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Fig. 2. Runtime comparison between RVHyper and our constraint-based monitor on a 
non-interference specification with traces of varying input size. 


Corollary 3. For any Y? HyperLTL formula Vr, n'.p in negation normal form 
over atomic propositions AP and any finite set of finite traces T € P(X) and 
finite trace t € X® it holds that 


te N Lily) = encaplt) E \ constr(@, t). 
teT teT 


Proof. It holds that Yt, t € Xt.t 4 t > constr(y,t) Æ constr(y,t’). Follows 
with same reasoning as in earlier proofs combined with Corollary 2. 


5 Experimental Evaluation 


We implemented two versions of the algorithm presented in this paper. The first 
implementation encodes the constraint system as a Boolean satisfiability prob- 
lem (SAT), whereas the second one represents it as a (reduced ordered) binary 
decision diagram (BDD). The formula rewriting is implemented in a Maude [8] 
script. The constraint system is solved by either CryptoMiniSat [23] or CUDD 
[22]. All benchmarks were executed on an Intel Core i5-6200U CPU @2.30 GHz 
with 8 GB of RAM. The set of benchmarks chosen for our evaluation is composed 
out of two benchmarks presented in earlier publications [12,13] plus instances of 
guarded invariants at which our implementations excels. 


Non-interference. Non-interference [16,19] is an important information flow 
policy demanding that an observer of a system cannot infer any high security 
input of a system by observing only low security input and output. Reformulated 
we could also say that all low security outputs o°% have to be equal on all 
system executions as long as the low security inputs i'°” of those executions are 
the same: Yr, n’. (0l + olw) Wile” « i!%). This class of benchmarks was 


Tv 
used to evaluated RVHyper [13], an automata-based runtime verification tool 
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Fig. 3. Runtime comparison between RVHyper and our constraint-based monitor on 
the guarded invariant benchmark with trace lengths 20, 20 bit input size. 


Table 1. Average results of our implementation compared to RVHyper on traces gen- 
erated from circuit instances. Every instance was run 10 times. 


instance # traces length time RVHyper time SAT time BDD 


XOR1 19 5 12ms 47 ms 49 ms 
XOR2 1000 5 16913 ms 996ms 1666 ms 
counterl 961 20 9610 ms 8274ms 303ms 
counter2 1353 20 19041 ms 13772ms 437ms 
MUX1 1000 5 14924 ms 693ms 647ms 
MUX2 80 5 121 ms T9 ms 81 ms 


for HyperLTL formulas. We repeated the experiments and depict the results 
in Fig. 2. We choose a trace length of 50 and monitored non-interference on 
1000 randomly generated traces, where we distinguish between a 64 bit input 
(left) and an 128 bit input (right). For 64 bit input, our BDD implementation 
performs comparably well to RVHyper, which statically constructs a monitor 
automaton. For 128 bit input, RVHyper was not able to construct the automaton 
in reasonable time. Our implementation, however, shows promising results for 
this benchmark class that puts the automata-based construction to its limit. 


Detecting Spurious Dependencies in Hardware Designs. The problem 
whether input signals influence output signals in hardware designs, was con- 
sidered in [13]. Formally, we specify this property as the following HyperLTL 
formula: YrıYT2. (Om, © On,) W(ir, + iz2), Where i denotes all inputs except 
i. Intuitively, the formula asserts that for every two pairs of execution traces 
(771,72) the value of o has to be the same until there is a difference between 
mı and mə in the input vector 7, i.e., the inputs on which o may depend. We 
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consider the same hardware and specifications as in [13]. The results are depicted 
in Table 1. Again, the BDD implementation handles this set of benchmarks well. 

The biggest difference can be seen 
between the runtimes for counter2. 
This is explained by the fact that 


Nin = 50 


Nout = 10 


— Nout = 20 è A 
i neat this benchmark demands the highest 
— nouw = 40 number of observed traces, and there- 


Nout = 50 


fore the impact of the quadratic run- 
time costs in the number of traces 
dominates the result. We can, in 
fact, clearly observe this correlation 
between the number of traces and the 
runtime on RVHyper’s performance 
over all benchmarks. On the other 
hand our constraint-based implemen- 
tations do not show this behavior. 


0 200 400 600 800 1,000 


number of traces 


Fig. 4. Runtime of the SAT-based algo- 
rithm on the guarded invariant benchmark 
with a varying number of atomic proposi- 
tions. 


Guarded Invariants. We consider 
a new class of benchmarks, called 
guarded invariants, which express a 
certain invariant relation between two 
traces, which are, additionally, guarded by a precondition. Figure3 shows the 
results of monitoring an arbitrary invariant P : X — B of the following form: 
Yr, Tt. O(Vicerir e irt) 2 O(P(r) @ P(n’)). Our approach significantly outper- 
forms RVHyper on this benchmark class, as the conjunct splitting optimization, 
described in Sect. 4.1, synergizes well with SAT-solver implementations. 


Atomic Proposition Scalability. While RVHyper is inherently limited in its 
scalability concerning formula size as the construction of the deterministic mon- 
itor automaton gets increasingly hard, the rewrite-based solution is not affected 
by this limitation. To put it to the test we have ran the SAT-based implementa- 
tion on guarded invariant formulas with up to 100 different atomic propositions. 
Formulas have the form: Vr, m’. (N3 (inim © inim)) > OV} (outjn > 
outjn’)), Where Nin, Nout represents the number of input and output atomic 
propositions, respectively. Results can be seen in Fig.4. Note that RVHyper 
already fails to build monitor automata for |Nin + Nouz| > 10. 


6 Conclusion 


We pursued the success story of rewrite-based monitors for trace properties by 
applying the technique to the runtime verification problem of Hyperproperties. 
We presented an algorithm that, given a V?HyperLTL formula, incrementally 
constructs constraints that represent requirements on future traces, instead of 
storing traces during runtime. Our evaluation shows that our approach scales in 
parameters where existing automata-based approaches reach their limits. 
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Abstract. Programs with randomization constructs is an active 
research topic, especially after the recent introduction of martingale- 
based analysis methods for their termination and runtimes. Unlike most 
of the existing works that focus on proving almost-sure termination or 
estimating the expected runtime, in this work we study the tail proba- 
bilities of runtimes—such as “the execution takes more than 100 steps 
with probability at most 1%.” To this goal, we devise a theory of super- 
martingales that overapproximate higher moments of runtime. These 
higher moments, combined with a suitable concentration inequality, yield 
useful upper bounds of tail probabilities. Moreover, our vector-valued 
formulation enables automated template-based synthesis of those super- 
martingales. Our experiments suggest the method’s practical use. 


1 Introduction 


The important roles of randomization in algorithms and software systems are 
nowadays well-recognized. In algorithms, randomization can bring remarkable 
speed gain at the expense of small probabilities of imprecision. In cryptography, 
many encryption algorithms are randomized in order to conceal the identity of 
plaintexts. In software systems, randomization is widely utilized for the purpose 
of fairness, security and privacy. 

Embracing randomization in programming languages has therefore been an 
active research topic for a long time. Doing so does not only offer a solid infras- 
tructure that programmers and system designers can rely on, but also opens 
up the possibility of language-based, static analysis of properties of randomized 
algorithms and systems. 

The current paper’s goal is to analyze imperative programs with randomiza- 
tion constructs—the latter come in two forms, namely probabilistic branching 
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and assignment from a designated, possibly continuous, distribution. We shall 
refer to such programs as randomized programs.* 


Runtime and Termination Analysis of Randomized Programs. The run- 
time of a randomized program is often a problem of our interest; so is almost-sure 
termination, that is, whether the program terminates with probability 1. In the 
programming language community, these problems have been taken up by many 
researchers as a challenge of both practical importance and theoretical interest. 

Most of the existing works on runtime and termination analysis follow either 
of the following two approaches. 


— Martingale-based methods, initiated with a notion of ranking supermartingale 
in [4] and extended [1,6,7,11,13], have their origin in the theory of stochas- 
tic processes. They can also be seen as a probabilistic extension of ranking 
functions, a standard proof method for termination of (non-randomized) pro- 
grams. Martingale-based methods have seen remarkable success in automated 
synthesis using templates and constraint solving (like LP or SDP). 

— The predicate-transformer approach, pursued in [2, 17,19], uses a more syntax- 
guided formalism of program logic and emphasizes reasoning by invariants. 


The essential difference between the two approaches is not big: an invariant 
notion in the latter is easily seen to be an adaptation of a suitable notion of 
supermartingale. The work [33] presents a comprehensive account on the order- 
theoretic foundation behind these techniques. 

These existing works are mostly focused on the following problems: decid- 
ing almost-sure termination, computing termination probabilities, and comput- 
ing expected runtime. (Here “computing” includes giving upper/lower bounds.) 
See [33] for a comparison of some of the existing martingale-based methods. 


Our Problem: Tail Probabilities for Runtimes. In this paper we focus on 
the problem of tail probabilities that is not studied much so far.? We present a 
method for overapproximating tail probabilities; here is the problem we solve. 


Input: a randomized program I’, and a deadline d € N 
Output: an upper bound of the tail probability Pr(Tiun > d), where Trun is the 
runtime of I’ 


Our target language is a imperative language that features randomization 
(probabilistic branching and random assignment). We also allow nondetermin- 
ism; this makes the program’s runtime depend on the choice of a scheduler (i.e. 
how nondeterminism is resolved). In this paper we study the longest, worst-case 
runtime (therefore our scheduler is demonic). In the technical sections, we use 
the presentation of these programs as probabilistic control graphs (pCFGs)—this 
is as usual in the literature. See e.g. [1,33]. 


1 With the rise of statistical machine learning, probabilistic programs attract a lot 
of attention. Randomized programs can be thought of as a fragment of probabilis- 
tic programs without conditioning (or observation) constructs. In other words, the 
Bayesian aspect of probabilistic programs is absent in randomized programs. 

2 An exception is [5]; see Sect. 7 for comparison with the current work. 
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An example of our target programisinFig.1. 1 x := 2; y := 2; 
It is an imperative program with randomization: nig ay Ba a ahaa 
in Line 3, the value of z is sampled from the uni- 4 if * then aie 
form distribution over the interval [—2, 1]. The : oe 
symbol x in the line 4 stands for a nondetermin- 7 y:=y+z 
istic Boolean value; in our analysis, it is resolved i Boo 


so that the runtime becomes the longest. 

Given the program in Fig. 1 and a choice ofa Fig. 1. An example program 
deadline (say d = 400), we can ask the question 
“what is the probability Pr(Tiun > d) for the runtime Tun of the program to 
exceed d = 400 steps?” As we show in Sect. 6, our method gives a guaranteed 
upper bound 0.0684. This means that, if we allow the time budget of d = 400 
steps, the program terminates with the probability at least 93%. 


a randomized program I" 


step 1: template-based synthesis of vector-valued supermartingales (§3, §5) 
v > 
upper bounds of higher moments E[Trun],..., E[(Trun)* | 
+ 
a deadline d— step 2: calculation via a concentration inequality (§4) 


Yy 
an upper bound of the tail probability Pr(Trun > d) 


Fig. 2. Our workflow 


Our Method: Concentration Inequalities, Higher Moments, and 
Vector-Valued Supermartingales. Towards the goal of computing tail prob- 
abilities, our approach is to use concentration inequalities, a technique from 
probability theory that is commonly used for overapproximating various tail 
probabilities. There are various concentration inequalities in the literature, and 
each of them is applicable in a different setting, such as a nonnegative ran- 
dom variable (Markov’s inequality), known mean and variance (Chebyshev’s 
inequality), a difference-bounded martingale (Azuma’s inequality), and so on. 
Some of them were used for analyzing randomized programs [5] (see Sect. 7 for 
comparison). 

In this paper, we use a specific concentration inequality that uses higher 
moments E[Trun|,.--;E[(Trun)* | of runtimes Tun, up to a choice of the maximum 
degree K. The concentration inequality is taken from [3]; it generalizes Markov’s 
and Chebyshev’s. We observe that a higher moment yields a tighter bound of 
the tail probability, as the deadline d grows bigger. Therefore it makes sense to 
strive for computing higher moments. 

For computing higher moments of runtimes, we systematically extend the 
existing theory of ranking supermartingales, from the expected runtime (i.e. the 
first moment) to higher moments. The theory features a vector-valued super- 
martingale, which not only generalizes easily to degrees up to arbitrary K €N, 
but also allows automated synthesis much like usual supermartingales. 
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We also claim that the soundness of these vector-valued supermartingales is 
proved in a mathematically clean manner. Following our previous work [33], our 
arguments are based on the order-theoretic foundation of fixed points (namely 
the Knaster-Tarski, Cousot-Cousot and Kleene theorems), and we give upper 
bounds of higher moments by suitable least fixed points. 

Overall, our workflow is as shown in Fig.2. We note that the step 2 in Fig. 2 
is computationally much cheaper than the step 1: in fact, the step 2 yields a 
symbolic expression for an upper bound in which d is a free variable. This makes 
it possible to draw graphs like the ones in Fig. 3. It is also easy to find a deadline 
d for which Pr(Tyun > d) is below a given threshold p € [0,1]. 

We implemented a prototype that synthesizes vector-valued supermartingales 
using linear and polynomial templates. The resulting constraints are solved by 
LP and SDP solvers, respectively. Experiments show that our method can pro- 
duce nontrivial upper bounds in reasonable computation time. We also experi- 
mentally confirm that higher moments are useful in producing tighter bounds. 


Our Contributions. Summarizing, the contribution of this paper is as follows. 


— We extend the existing theory of ranking supermartingales from expected 
runtimes (i.e. the first moment) to higher moments. The extension has a solid 
foundation of order-theoretic fixed points. Moreover, its clean presentation by 
vector-valued supermartingales makes automated synthesis as easy as before. 
Our target randomized programs are rich, embracing nondeterminism and 
continuous distributions. 

— We study how these vector-valued supermartingales (and the resulting upper 
bounds of higher moments) can be used to yield upper bounds of tail probabil- 
ities of runtimes. We identify a concentration lemma that suits this purpose. 
We show that higher moments indeed yield tighter bounds. 

— Overall, we present a comprehensive language-based framework for overap- 
proximating tail probabilities of runtimes of randomized programs (Fig. 2). It 
has been implemented, and our experiments suggest its practical use. 


Organization. We give preliminaries in Sect. 2. In Sect. 3, we review the order- 
theoretic characterization of ordinary ranking supermartingales and present an 
extension to higher moments of runtimes. In Sect.4, we discuss how to obtain 
an upper bound of the tail probability of runtimes. In Sect.5, we explain an 
automated synthesis algorithm for our ranking supermartingales. In Sect. 6, we 
give experimental results. In Sect. 7, we discuss related work. We conclude and 
give future work in Sect. 8. Some proofs and details are deferred to the appendices 
available in the extended version [22]. 


2 Preliminaries 


We present some preliminary materials, including the definition of pCFGs (we 
use them as a model of randomized programs) and the definition of runtime. 
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Given topological spaces X and Y, let B(X) be the set of Borel sets on X 
and 6(X,Y) be the set of Borel measurable functions X — Y. We assume that 
the set R of reals, a finite set L and the set [0,00] are equipped with the usual 
topology, the discrete topology, and the order topology, respectively. We use the 
induced Borel structures for these spaces. Given a measurable space X, let D(X) 
be the set of probability measures on X. For any u € D(X), let supp(js) be the 
support of u. We write E[X] for the expectation of a random variable X. 

Our use of pCFGs follows recent works including [1]. 


Definition 2.1 (pCFG). A probabilistic control flow graph (pCFG) is a tuple 
I = (L, V, linit, init, œ, Up, Pr, G) that consists of the following. 


— A finite set L of locations. It is a disjoint union of sets Lp, Lp, Ln and LA 
of deterministic, probabilistic, nondeterministic and assignment locations. 

— A finite set V of program variables. 

— An initial location linit E€ L. — An initial valuation £in € RY 

— A transition relation ++ C L x L which is total (i.e. VI. Al’. LV). 

— An update function Up : La > V x (B(RY,R)UD(R)UB(R) ) for assignment. 

— A family Pr = (Pri)iez, of probability distributions, where Pr; € D(L), for 
probabilistic locations. We require that l’ € supp(Pr;) implies l > I’. 

— A guard function G : Lp x L — B(RY) such that for each l € Lp and æ € RY, 
there exists a unique location I’ € L satisfying l > I’ and æ € G(l, 1’). 


The update function can be decomposed into three functions Upp : Lap — 
V x B(RY,R), Upp : Lap — V x D(R) and Upy : Lan > V x B(R), under 
a suitable decomposition La = Lap U Lap U Lan of assignment locations. 
The elements of Lap, Lap and Lan represent deterministic, probabilistic and 
nondeterministic assignments, respectively. See e.g. [33]. 


An example of a pCFG is shown on IO} ris TO _ 
the right. It models the program in Fig. 1. Ge es r>0 Unif(—2, 1) 


bf . . . F 1 4 l 
The node l4 is a nondeterministic loca- aoe © a 


tion. Unif(—2, 1) is the uniform distribu- ea fe 
tion on the interval [—2, 1]. D 

A configuration of a pCFG T is a pair (l,a) € L x RY of a location and 
a valuation. We regard the set S = L x RY of configurations is equipped with 
the product topology where L is equipped with the discrete topology. We say a 
configuration (l’, x’) is a successor of (l, æ), if +> l’ and the following hold. 


— Ifl € Lp, then x’ = z and « € G(L, l’). -Ifl € Ly U Lp, then 2’ = z. 

- Ifl € La, then x’ = a(x; — a), where x(x; — a) denotes the vector obtained 
by replacing the «;-component of x by a. Here x; is such that Up(/) = (xj, u), 
and a is chosen as follows: (1) a = u(æ) if u € B(RY,R); (2) a € supp(u) if 
u € D(R); and (3) a € u if u € B(R). 


An invariant of a pCFG T is a measurable set I € B(S) such that (linit, Zinit) € I 
and I is closed under taking successors (i.e. if c € I and c’ is a successor of c 
then c’ € I). Use of invariants is a common technique in automated synthesis 
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of supermartingales [1]: it restricts configuration spaces and thus makes the 
constraints on supermartingales weaker. It is also common to take an invariant as 
a measurable set [1]. A run of T is an infinite sequence of configurations coc, ... 
such that co is the initial configuration (linit, Zinit) and cj41 is a successor of c; 
for each i. Let Run(I) be the set of runs of I. 

A scheduler resolves nondeterminism: at a location in Ly U Lan, it chooses 
a distribution of next configurations depending on the history of configurations 
visited so far. Given a pCFG I and a scheduler ø of I’, a probability measure 
vł on Run(L) is defined in the usual manner. See [22, Appendix B] for details. 
Definition 2.2 (reaching time TZ,TZ,,). Let I be a pCFG and C C S bea 
set of configurations called a destination. The reaching time to C is a function 
TE : Run(I’) —> [0,00] defined by (T&E)(coci...) = argminjey(c; € C). Fixing 
a scheduler ø makes T, A a random variable, since ø determines a probability 
measure vf on Run(J’). It is denoted by TE ,. 


Runtimes of pCFGs are a special case of reaching times, namely to the set 
of terminating configurations. 

The following higher moments are central to our framework. Recall that we 
are interested in demonic schedulers, i.e. those which make runtimes longer. 


Definition 2.3 Mes and Ma"). Assume the setting of Definition 2.2, and let 
k € N and c € S. We write Mgs (c) for the k-th moment of the reaching time 
of I’ from c to C under the scheduler ø, i.e. that is, Mg% (c) = E(TSs)*] = 
f (eye dvt: where I, is a pCFG obtained from I’ by changing the initial config- 


l ; : ; —T,k 
uration to c. Their supremum under varying o is denoted by Mo := sup, Mae. 


3 Ranking Supermartingale for Higher Moments 


We introduce one of the main contributions in the paper, a notion of rank- 
ing supermartingale that overapproximates higher moments. It is motivated by 
the following observation: martingale-based reasoning about the second moment 
must concur with one about the first moment. We conduct a systematic theo- 
retical extension that features an order-theoretic foundation and vector-valued 
supermartingales. The theory accommodates nondeterminism and continuous 
distributions, too. We omit some details and proofs; they are in [22, Appendix C]. 
The fully general theory for higher moments will be presented in Sect. 3.2; 
we present its restriction to the second moments in Sect. 3.1 for readability. 
Prior to these, we review the existing theory of ranking supermartingales, 
through the lens of order-theoretic fixed points. In doing so we follow [33]. 


Definition 3.1 (“nexttime” operation X (pre-expectation)). Given 7 : 
S — [0, oo], let Xy : S — [0,00] be the function defined as follows. 
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- Ifl € Lp and æ (LU), then (Xn) (l, £) = nl, x). 

— If l € Lp, then (Xn)(l, £) = 9p Pul nV, æ). 

— Ifl € Ly, then (Xn)(l, £) = sups y n(U, æ). 

- Ifl € La, Up(l) = (a;,u) and l |> I, if u € B(RY,R), then (Xn)(l, £) = 
nl, æ(z; — u(w))); if u € D(R), then (Xn)(l, £) = fe n(l’, e(z; — y)) duly); 
and if u € B(R), then (Xn)(l, £) = supyc, nV, «(aj — y)). 


Intuitively, Xy is the expectation of 7 after one transition. Nondeterminism is 
resolved by the maximal choice. 
We define F; : (S — [0, œ0]) — (S — [0, œ0]) as follows. 


(Fi (n))(c) = t + (Xn)(c) ceIT\C 


0 otherwise 
The function F; is an adaptation of the Bellman operator, a classic notion in the 
theory of Markov processes. A similar notion is used e.g. in [19]. The function 
space (S — [0,œ0]) is a complete lattice structure, because [0,00] is; moreover 
F is easily seen to be monotone. It is not hard to see either that the expected 


(Here “1+” accounts for time elapse) 


=T 
reaching time Mo to C coincides with the least fixed point pF}. 
The following theorem is fundamental in theoretical computer science. 


Theorem 3.2 (Knaster—Tarski, [34]). Let (L,<) be a complete lattice and 
f:L— L be a monotone function. The least fixed point uf is the least prefixed 
point, i.e. uf = min{l E L| f(D) < th. 


The significance of the Knaster-Tarski theorem in verification lies in the induced 
proof rule: f (l) < l = uf < l. Instantiating to the expected reaching time Ma = 
pF), it means Fi (n) <n > Mo <n, ie. an arbitrary prefixed point of Fı—which 
coincides with the notion of ranking supermartingale [4|—overapproximates the 
expected reaching time. This proves soundness of ranking supermartingales. 


3.1 Ranking Supermartingales for the Second Moments 


We extend ranking supermartingales to the second moments. It paves the way 
to a fully general theory (up to the K-th moments) in Sect. 3.2. 

The key in the martingale-based reasoning of expected reaching times (i.e. 
first moments) was that they are characterized as the least fixed point of a 
function F. Here it is crucial that for an arbitrary random variable T, we have 
u[T + 1] = E[T] +1 and therefore we can calculate E[T + 1] from E[T]. However, 
this is not the case for second moments. As E[(T+1)?] = E[T?] + 2E[T] + 
1, calculating the second moment requires not only E[T?] but also E[T]. This 
encourages us to define a vector-valued supermartingale. 


Definition 3.3 (time-elapse function El,). A function El, :[0, 00]? — [0, o0]? 


is defined by El; (#1, £2) = (zı + 1,22 + 221 + 1). 


Then, an extension of F} for second moments can be defined as a combination 
of the time-elapse function El; and the pre-expectation X. 
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Definition 3.4 (F2). Let I be an invariant and C C I be a Borel set. We define 
F3 : (S > [0,c0]”) > (S > [0, o0]?) by 
X(El, o) ceI\c 
(F0) (©) = [i U 


(0,0) otherwise. 
Here X is applied componentwise: (X(m, n2) (9 = ((Xm)(c), (Xn2)(c)). 


We can extend the complete lattice structure of [0, co] to the function space 
S — [0, oo]? in a pointwise manner. It is a routine to prove that F> is monotone 
with respect to this complete lattice structure. Hence Fo has the least fixed 
point. In fact, while Mo was characterized as the least fixed point of F}, a tuple 
(Mc, Mc”) is not the least fixed point of F (cf. Example 3.8 and Theorem 3.9). 
However, the least fixed point of Fo overapproximates the tuple of moments. 


Theorem 3.5. For any configuration c € I, (tF2)(c) > (Mo (c), Ma" (0). 


Let TS on = min{n, T, bo} To prove the above theorem, we inductively prove 


PA S on a e) 
for each o and n, and take the supremum. See [22, Appendix C] for more details. 
Like ranking supermartingale for first moments, ranking supermartingale for 
second moments is defined as a prefixed point of Fy, i.e. a function 7 such that 
n > F2(n). However, we modify the definition for the sake of implementation. 


Definition 3.6 (ranking supermartingale for second moments). A rank- 
ing supermartingale for second moments is a function 7 : S — R? such that: (i) 


n(c) > (X(El; o 7))(c) for each c € I \ C; and (ii) n(c) > 0 for each c € T. 


Here, the time-elapse function El; captures a positive decrease of the ranking 
supermartingale. Even though we only have inequality in Theorem 3.5, we can 
prove the following desired property of our supermartingale notion. 


Theorem 3.7. If 1: S — R? is a supermartingale for second moments, then 
(Mg (c), Mg (0) < n(c) for each cE I. 


The following example and theorem show that we cannot replace > with = 
in Theorem 3.5 in general, but it is possible in the absence of nondeterminism. 


Example 3.8. The figure on the right 
shows a pCFG such that l2 € Lp and all the 
other locations are in Ly, the initial location 
is lọ and l42 is a terminating location. For the 
pCFG, the left-hand side of the inequality in 
Theorem 3.5 is uF2(lo) = (6, 37.5). In contrast, if a scheduler ø takes a transition 
from l; to lz with probability p, (MZ4 (lo), MẸ? (lo) = (6 — 4p, 36 — 3p). Hence 


Ao 10 


the right-hand side is (Mc (lo), Mc’ (lo)) = (6, 36). 
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Theorem 3.9. If Ly = Lan = 9, Ve € I. (uFo)(c) = (Me (c),Mo’ (0). 


3.2 Ranking Supermartingales for the Higher Moments 


We extend the result in Sect. 3.1 to moments higher than second. 
Firstly, the time-elapse function El, is generalized as follows. 


Definition 3.10 (time-elapse function El *). For K €N and ke {1,..., K}, 
a function El* : [0,co]* — [0,00] is defined by E} (æ1,... £K) = 1+ 
ae (5) xj. Here (5) is the binomial coefficient. 


Again, a monotone function Fg is defined as a combination of the time-elapse 
function 12 and the pre-expectation X. 


Definition 3.11 (Fg). Let I be an invariant and C C I be a Borel set. We 
define Fx : (S — [0,00]}*) > (S => [0, 00]*) by Fx(n)(c) = (Fra) (e), 
Fg, x(n)(c)), where Fg, : (S — [0, 00]*) — (S — [0, 00]) is given by 
XE on) cEI\C 
(Frat) (0) = t R 
otherwise. 


As in Definition 3.6, we define a supermartingale as a prefixed point of Fg. 


Definition 3.12 (ranking supermartingale for K-th moments). We 
define m,...,7K : S > R by (m(o),.-.,nK(c)) = n(c). A ranking supermartin- 
gale for K-th moments is a function 7 : S — R* such that for each k, (i) 
me(c) > (X(ELS* o np))(c) for each c € I \ C; and (ii) m(c) > 0 for each c € I. 


For higher moments, we can prove an analogous result to Theorem 3.7. 


Theorem 3.13. If 7 is a supermartingale for K-th moments, then for each 
zil TK 
cel, (Me (c),...,;Me (c)) < n(c). 


4 From Moments to Tail Probabilities 


We discuss how to obtain upper bounds of tail probabilities of runtimes 
from upper bounds of higher moments of runtimes. Combined with the result 
in Sect. 3, it induces a martingale-based method for overapproximating tail prob- 
abilities. 

We use a concentration inequality. There are many choices of concentration 
inequalities (see e.g. [3]), and we use a variant of Markov’s inequality. We prove 
that the concentration inequality is not only sound but also complete in a sense. 

Formally, our goal is to calculate is an upper bound of Pr(Té o 2 d) for 
a given deadline d > 0, under the assumption that we know upper bounds 
u1, ..., ug of moments E TE ols ...,E[(TZ,,)*]. In other words, we want to over- 
approximate sup, 4([d, o0]) where u ranges over the set of probability measures 
on [0,00] satisfying (f xdu(æ),..., |f x® du(x)) < (u,..- ux). 

To answer this problem, we use a generalized form of Markov’s inequality. 
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Proposition 4.1 (see e.g. [3, §2.1]). Let X be a real-valued random variable 
and ¢ be a nondecreasing and nonnegative function. For any d € R with (d) > 0, 
s[o(X)) 

o(d) 


Pr(X >d)< 


By letting ¢(x) = z! in Proposition 4.1, we obtain the following inequality. 
It gives an upper bound of the tail probability that is “tight.” 


X*] < 


Proposition 4.2. Let X be a nonnegative random variable. Assume E 
ux for each k € {0,..., K}. Then, for any d >Q, 


. Uk 
> < —. 
Pr(X > d) < eer (1) 


Moreover, this upper bound is tight: for any d > 0, there exists a probability 
measure such that the above equation holds. 


Proof. The former part is immediate from Proposition 4.1. For the latter part, 
consider u = pda + (1 — p)dp where ôs is the Dirac measure at x and p is the 
value of the right-hand side of (1). 


By combining Theorem 3.13 with Proposition 4.2, we obtain the following 
corollary. We can use it for overapproximating tail probabilities. 


Corollary 4.3. Let n : S — RE be a ranking supermartingale for K-th 
moments. For each scheduler o and a deadline d > 0, 


T > < : ne ( init, “init 
Pr(To5 >d)< Mn n (2) 


Here no, ..., ng are defined by no(c) = 1 and n(c) = (m (c), ..., ng (c)). 


Note that if K = 1, Corollary 4.3 is essentially the same as [5, Thm 4]. 
Note also that for each K there exists d > 0 such that tx (linit Bint) 

ming<p<K e (linis inie), Hence higher moments become useful in overapproxi- 
mating tail probabilities as d gets large. Later in Sect. 6, we demonstrate this 


fact experimentally. 


5 Template-Based Synthesis Algorithm 


We discuss an automated synthesis algorithm that calculates an upper bound 
for the k-th moment of the runtime of a pCFG using a supermartingale in 
Definitions 3.6 or 3.12. It takes a pCFG T, an invariant J, a set C C I of 
configurations, and a natural number K as input and outputs an upper bound 
of K-th moment. 

Our algorithm is adapted from existing template-based algorithms for synthe- 
sizing a ranking supermartingale (for first moments) [4,6,7]. It fixes a linear or 
polynomial template with unknown coefficients for a supermartingale and using 
numerical methods like linear programming (LP) or semidefinite programming 
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(SDP), calculate a valuation of the unknown coefficients so that the axioms of 
ranking supermartingale for K-th moments are satisfied. 
We hereby briefly explain the algorithms. See [22, Appendix D] for details. 


Linear Template. Our linear template-based algorithm is adapted from [4,7]. 
We should assume that I’, J and C are all “linear” in the sense that expressions 
appearing in J’ are all linear and J and C are represented by linear inequalities. 
To deal with assignments from a distribution like x := Norm(0,1), we also 
assume that expected values of distributions appearing in I’ are known. 

The algorithm first fixes a template for a supermartingale: for each location 
lL, it fixes a K-tuple Ta al Tj +0!,... yd al! gzj + ble) of linear formulas. 
Here each al; and b are unknown variables called parameters. The algorithm 
next collects conditions on the parameters so that the tuples constitute a rank- 
ing supermartingale for K-th moments. It results in a conjunction of formulas 
of a form y1 >OA-:-Agm>0 => vw > 0. Here ¢1,...,m are linear formulas 
without parameters and yw is a linear formula where parameters linearly appear 
in the coefficients. By Farkas’ lemma (see e.g. [29, Cor 7.1h]) we can turn such 
formulas into linear inequalities over parameters by adding new variables. Its fea- 
sibility is efficiently solvable with an LP solver. We naturally wish to minimize 
an upper bound of the K-th moment, i.e. the last component of 7(linit, Linit)- 
We can minimize it by setting it to the objective function of the LP problem. 


Polynomial Template. The polynomial template-based algorithm is based 
on [6]. This time, I’, J and C can be “polynomial.” To deal with assignments of 
distributions, we assume that the n-th moments of distributions in I" are easily 
calculated for each n € N. It is similar to the linear template-based one. 

It first fixes a polynomial template for a supermartingale, i.e. it assigns each 
location | a K-tuple of polynomial expressions with unknown coefficients. Like- 
wise the linear template-based algorithm, the algorithm reduces the axioms of 
supermartingale for higher moments to a conjunction of formulas of a form 
yi > 0A- APm > 0 > y4 =O. This time, each y; is a polynomial formula 
without parameters and w is a polynomial formula whose coefficients are linear 
formula over the parameters. In the polynomial case, a conjunction of such for- 
mula is reduced to an SDP problem using a theorem called Positivstellensatz (we 
used a variant called Schmiidgen’s Positivstellensatz [28]). We solve the resulting 
problem using an SDP solver setting 7(linit, init) as the objective function. 


6 Experiments 


We implemented two programs in OCaml to synthesize a supermartingale based 
on (a) a linear template and (b) a polynomial template. The programs translate 
a given randomized program to a pCFG and output an LP or SDP problem as 
described in Sect.5. An invariant J and a terminal configuration C for the input 
program are specified manually. See e.g. [20] for automatic synthesis of an invari- 
ant. For linear templates, we have used GLPK (v4.65) [12] as an LP solver. For 
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polynomial templates, we have used SOSTOOLS (v3.03) [31] (a sums of squares 
optimization tool that internally uses an SDP solver) on Matlab (R2018b). We 
used SDPT3 (v4.0) [30] as an SDP solver. The experiments were carried out on 
a Surface Pro 4 with an Intel Core i5-6300U (2.40GHz) and 8GB RAM. We 
tested our implementation for the following two programs and their variants, 
which were also used in the literature [7,19]. Their code is in [22, Appendix E]. 


Coupon collector’s problem. A probabilistic model of collecting coupons enclosed 
in cereal boxes. There exist n types of coupons, and one repeatedly buy cereal 
boxes until all the types of coupons are collected. We consider two cases: (1-1) 
n = 2 and (1-2) n = 4. We tested the linear template program for them. 


Random walk. We used three variants of 1-dimensional random walks: (2-1) 
integer-valued one, (2-2) real-valued one with assignments from continuous 
distributions, (2-3) with adversarial nondeterminism; and two variants of 2- 
dimensional random walks (2-4) and (2-5) with assignments from continuous 
distributions and adversarial nondeterminism. We tested both the linear and 
the polynomial template programs for these examples. 


Experimental results. We measured execution times needed for Step 1 in 
Fig. 2. The results are in Table 1. Execution times are less than 0.2s for lin- 
ear template programs and several minutes for polynomial template programs. 
Upper bounds of tail probabilities obtained from Proposition 4.2 are in Fig. 3. 

We can see that our method is applicable even with nondeterministic branch- 
ing ((2-3), (2-4) and (2-5)) or assignments from continuous distributions ((2-2), 
(2-4) and (2-5)). We can use a linear template for bounding higher moments as 
long as there exists a supermartingale for higher moments representable by linear 
expressions ((1-1), (1-2) and (2-3)). In contrast, for (2-1), (2-2) and (2-4), only 
a polynomial template program found a supermartingale for second moments. 

It is expectable that the polynomial template program gives a better bound 
than the linear one because a polynomial template is more expressive than a 
linear one. However, it did not hold for some test cases, probably because of 
numerical errors of the SDP solver. For example, (2-1) has a supermartingale 
for third moments that can be checked by a hand calculation, but the SDP 
solver returned “infeasible” in the polynomial template program. It appears that 
our program fails when large numbers are involved (e.g. the third moments of 
(2-1), (2-2) and (2-3)). We have also tested a variant of (2-1) where the initial 
position is multiplied by 10000. Then the SDP solver returned “infeasible” in 
the polynomial template program while the linear template program returns a 
nontrivial bound. Hence it seems that numerical errors are likely to occur to the 
polynomial template program when large numbers are involved. 

Figure3 shows that the bigger the deadline d is, the more useful higher 
moments become (cf. a remark just after Corollary4.3). For example, in 
(1-2), an upper bound of Pr(TZ,, > 100) calculated from the upper bound of 
the first moment is 0.680 while that of the fifth moment is 0.105. 

To show the merit of our method compared with sampling-based methods, 
we calculated a tail probability bound for a variant of (2-2) (shown in Fig. 4 on 


(1-1) 
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Fig. 3. Upper bounds of the tail probabilities (except (2-5)). Each gray line is the 
value of Fa where ux is the best upper bound in Table 1 of k-th moments and d is 
a deadline. Each black line is the minimum of gray lines, i.e. the upper bound by 
Proposition 4.2. The red lines in (1-1), (1-2) and (2-1) show the true tail probabilities 
calculated analytically. The red points in (2-2) show tail probabilities calculated by 
Monte Carlo sampling where the number of trials is 100000000. We did not calculate 
the true tail probabilities nor approximate them for (2-4) and (2-5) because these 
examples seem difficult to do so due to nondeterminism. (Color figure online) 


Table 1. Upper bounds of the moments of runtimes. 
“” indicates that the LP or SDP solver returned 
“infeasible”. The “degree” column shows the degree 
of the polynomial template used in the experiments. 


(a) linear template (b) polynominal template 
moment|upper bound]time (s)|upper bound]time (s) |degree 
(-1)| Ist 13 0.012 
2nd 201 0.019 
3rd 3829 0.023 
(1-2) st 68 0.024 
2nd 3124 0.054 
3rd 171932 0.089 
4th 12049876 0.126 
5th 1048131068 | 0.191 
C-i) Ist 20 0.024 20.0 24.980 | 2 
2nd - 0.013 2320.0 37.609 2 
3rd - 0.017 - 30.932 3 
(2-2) st 75 0.009 75.0 33.372 2 
2nd - 0.014 8375.0 73.514 2 
3rd - 0.021 - 170.416 3 
(2-3) st 62 0.020 62.0 40.746 2 
2nd 28605.4 0.038 6710.0 97.156 2 
3rd 19567043.36 | 0.057 - 35.427 3 
(2-4)|__ 1st 96 0.020 95.95 157.748| 2 
2nd - 0.029 10944.0 361.957 2 
(2-5)| Ist 90 0.022 143.055| 2 
2nd - 0.042 - 327.202 2 


1 x := 200000000; 

2 while true do 

3 if prob(0.7) then 
4 z := Unif (0,1); 
5 x 72 E 

6 else 

7 z := Unif (0,1); 
8 xX t= x+ 2 

9 fi; 


10 refute (x < 0) 
11 od 


Fig. 4. A variant of (2-2). 
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p. 12) with a deadline d = 10". Because of its very long expected runtime, a 
sampling-based method would not work for it. In contrast, the linear template- 
based program gave an upper bound Pr(T „ > 10") < 5000000025/10'! ~ 0.05 
in almost the same execution time as (2-2) (< 0.02s). 


7 Related Work 


Martingale-Based Analysis of Randomized Programs. Martingale-based 
methods are widely studied for the termination analysis of randomized pro- 
grams. One of the first is ranking supermartingales, introduced in [4] for prov- 
ing almost sure termination. The theory of ranking supermartingales has since 
been extended actively: accommodating nondeterminism [1,6,7,11], syntax- 
oriented composition of supermartingales [11], proving properties beyond ter- 
mination/reachability [13], and so on. Automated template-based synthesis of 
supermartingales by constraint solving has been pursued, too [1,4,6,7]. 

Other martingale-based methods that are fundamentally different from rank- 
ing supermartingales have been devised, too. They include: different notions of 
repulsing supermartingales for refuting termination (in [8,33]; also studied in 
control theory [32]); and multiply-scaled submartingales for underapproximating 
reachability probabilities [33,36]. See [33] for an overview. 

In the literature on martingale-based methods, the one closest to this work 
is [5]. Among its contribution is the analysis of tail probabilities. It is done by 
either of the following combinations: (1) difference-bounded ranking supermartin- 
gales and the corresponding Azuma’s concentration inequality; and (2) (not nec- 
essarily difference-bounded) ranking supermartingales and Markov’s concentra- 
tion inequality. When we compare these two methods with ours, the first method 
requires repeated martingale synthesis for different parameter values, which can 
pose a performance challenge. The second method corresponds to the restriction 
of our method to the first moment; recall that we showed the advantage of using 
higher moments, theoretically (Sect.4) and experimentally (Sect.6). See [22, 
Appendix F.1] for detailed discussions. Implementation is lacking in [5], too. 

We use Markov’s inequality to calculate an upper bound of Pr(Tiun > d) from 
a ranking supermartingale. In [7], Hoeffding’s and Bernstein’s inequalities are 
used for the same purpose. As the upper bounds obtained by these inequalities 
are exponentially decreasing with respect to d, they are asymptotically tighter 
than our bound obtained by Markov’s inequality, assuming that we use the same 
ranking supermartingale. However, Hoeffding’s and Bernstein’s inequalities are 
applicable to limited classes of ranking supermartingales (so-called difference- 
bounded and incremental ones, respectively). There exists a randomized pro- 
gram whose tail probability for runtimes is decreasing only polynomially (not 
exponentially, see [22, Appendix G]); this witnesses that there are cases where 
the methods in [7] do not apply but ours can. 

The work [1] is also close to ours in that their supermartingales are vector- 
valued. The difference is in the orders: in [1] they use the lexicographic order 
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between vectors, and they aim to prove almost sure termination. In contrast, we 
use the pointwise order between vectors, for overapproximating higher moments. 


The Predicate-Transformer Approach to Runtime Analysis. In the run- 
time/termination analysis of randomized programs, another principal line of 
work uses predicate transformers [2,17,19], following the precedent works on 
probabilistic predicate transformers such as [21,25]. In fact, from the mathemati- 
cal point of view, the main construct for witnessing runtime/termination in those 
predicate transformer calculi (called invariants, see e.g. in [19]) is essentially the 
same thing as ranking supermartingales. Therefore the difference between the 
martingale-based and predicate-transformer approaches is mostly the matter of 
presentation—the predicate-transformer approach is more closely tied to pro- 
gram syntax and has a stronger deductive flavor. It also seems that there is less 
work on automated synthesis in the predicate-transformer approach. 

In the predicate-transformer approach, the work [17] is the closest to ours, 
in that it studies variance of runtimes of randomized programs. The main dif- 
ferences are as follows: (1) computing tail probabilities is not pursued [17]; (2) 
their extension from expected runtimes to variance involves an additional vari- 
able 7, which poses a challenge in automated synthesis as well as in generalization 
to even higher moments; and (3) they do not pursue automated analysis. See 
Appendix F.2 of the extended version [22] for further details. 


Higher Moments of Runtimes. Computing and using higher moments of 
runtimes of probabilistic systems—generalizing randomized programs—has been 
pursued before. In [9], computing moments of runtimes of finite-state Markov 
chains is reduced to a certain linear equation. In the study of randomized algo- 
rithms, the survey [10] collects a number of methods, among which are some tail 
probability bounds using higher moments. Unlike ours, none of these methods 
are language-based static ones. They do not allow automated analysis. 


Other Potential Approaches to Tail Probabilities. We discuss potential 
approaches to estimating tail probabilities, other than the martingale-based one. 

Sampling is widely employed for approximating behaviors of probabilistic 
systems; especially so in the field of probabilistic programming languages, since 
exact symbolic reasoning is hard in presence of conditioning. See e.g. [35]. We 
also used sampling to estimate tail probabilities in (2-2), Fig. 3. The main advan- 
tages of our current approach over sampling are threefold: (1) our upper bounds 
come with a mathematical guarantee, while the sampling bounds can always be 
erroneous; (2) it requires ingenuity to sample programs with nondeterminism; 
and (3) programs whose execution can take millions of years can still be ana- 
lyzed by our method in a reasonable time, without executing them. The latter 
advantage is shared by static, language-based analysis methods in general; see 
e.g. [2]. 

Another potential method is probabilistic model checkers such as PRISM [23]. 
Their algorithms are usually only applicable to finite-state models, and thus not 
to randomized programs in general. Nevertheless, fixing a deadline d can make 
the reachable part S<q of the configuration space S finite, opening up the pos- 
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sibility of use of model checkers. It is an open question how to do so precisely, 
and the following challenges are foreseen: (1) if the program contains contin- 
uous distributions, the reachable part S<q becomes infinite; (2) even if S<q is 
finite, one has to repeat (supposedly expensive) runs of a model checker for each 
choice of d. In contrast, in our method, an upper bound for the tail probabil- 
ity Pr(Liun > d) is symbolically expressed as a function of d (Proposition 4.2). 


Therefore, estimating tail probabilities for varying d is computationally cheap. 


8 Conclusions and Future Work 


We provided a technique to obtain an upper bound of the tail probability of 
runtimes given a randomized algorithm and a deadline. We first extended the 
ordinary ranking supermartingale notion using the order-theoretic characteri- 
zation so that it can calculate upper bounds of higher moments of runtimes 
for randomized programs. Then by using a suitable concentration inequality, 
we introduced a method to calculate an upper bound of tail probabilities from 
upper bounds of higher moments. Our method is not only sound but also com- 
plete in a sense. Our method was obtained by combining our supermartingale 
and the concentration inequality. We also implemented an automated synthesis 
algorithm and demonstrated the applicability of our framework. 


Future Work. Example 3.8 shows that our supermartingale is not complete: it 
sometimes fails to give a tight bound for higher moments. Studying and improv- 
ing the incompleteness is one possible direction of future work. For example, the 
following questions would be interesting: Can bounds given by our supermartin- 
gale be arbitrarily bad? Can we remedy the completeness by restricting the type 
of nondeterminism? Can we define a complete supermartingale? 

Making our current method compositional is another direction of future 
research. Use of continuations, as in [18], can be a technical solution. 

We are also interested in improving the implementation. The polynomial 
template program failed to give an upper bound for higher moments because 
of numerical errors (see Sect.6). We wish to remedy this situation. There exist 
several studies for using numerical solvers for verification without affected by 
numerical errors [14—16, 26,27]. We might make use of these works for improve- 
ments. 
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Abstract. Free-Choice Workflow Petri nets, also known as Workflow 
Graphs, are a popular model in Business Process Modeling. 

In this paper we introduce Timed Probabilistic Workflow Nets 
(TPWNs), and give them a Markov Decision Process (MDP) semantics. 
Since the time needed to execute two parallel tasks is the maximum of 
the times, and not their sum, the expected time cannot be directly com- 
puted using the theory of MDPs with rewards. In our first contribution, 
we overcome this obstacle with the help of “earliest-first” schedulers, 
and give a single exponential-time algorithm for computing the expected 
time. 

In our second contribution, we show that computing the expected time 
is #P-hard, and so polynomial algorithms are very unlikely to exist. Fur- 
ther, ##P-hardness holds even for workflows with a very simple structure 
in which all transitions times are 1 or 0, and all probabilities are 1 or 
0.5. 

Our third and final contribution is an experimental investigation of 
the runtime of our algorithm on a set of industrial benchmarks. Despite 
the negative theoretical results, the results are very encouraging. In par- 
ticular, the expected time of every workflow in a popular benchmark 
suite with 642 workflow nets can be computed in milliseconds. Data or 
code related to this paper is available at: [24]. 


1 Introduction 


Workflow Petri Nets are a popular model for the representation and analysis 
of business processes [1,3,7]. They are used as back-end for different notations 
like BPMN (Business Process Modeling Notation), EPC (Event-driven Process 
Chain), and UML Activity Diagrams. 
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There is recent interest in extending these notations with quantitative infor- 
mation, like probabilities, costs, and time. The final goal is the development of 
tool support for computing performance metrics, like the average cost or the 
average runtime of a business process. 

In a former paper we introduced Probabilistic Workflow Nets (PWN), a foun- 
dation for the extension of Petri nets with probabilities and rewards [11]. We 
presented a polynomial time algorithm for the computation of the expected cost 
of free-choice workflow nets, a subclass of PWN of particular interest for the 
workflow process community (see e.g. [1,10,13,14]). For example, 1386 of the 
1958 nets in the most popular benchmark suite in the literature are free-choice 
Workflow Nets [12]. 

In this paper we introduce Timed PWNs (TPWNs), an extension of PWNs 
with time. Following [11], we define a semantics in terms of Markov Decision Pro- 
cesses (MDPs), where, loosely speaking, the nondeterminism of the MDP models 
absence of information about the order in which concurrent transitions are exe- 
cuted. For every scheduler, the semantics assigns to the TPWN an expected 
time to termination. Using results of [11], we prove that this expected time is 
actually independent of the scheduler, and so that the notion “expected time of 
a TPWN?” is well defined. 

We then proceed to study the problem of computing the expected time of a 
sound TPWN (loosely speaking, of a TPWN that terminates successfully with 
probability 1). The expected cost and the expected time have a different interplay 
with concurrency. The cost of executing two tasks in parallel is the sum of the 
costs (cost models e.g. salaries of power consumption), while the execution time 
of two parallel tasks is the maximum of their individual execution times. For this 
reason, standard reward-based algorithms for MDPs, which assume additivity 
of the reward along a path, cannot be applied. 

Our solution to this problem uses the fact that the expected time of a TPWN 
is independent of the scheduler. We define an “earliest-first” scheduler which, 
loosely speaking, resolves the nondeterminism of the MDP by picking transi- 
tions with earliest possible firing time. Since at first sight the scheduler needs 
infinite memory, its corresponding Markov chain is infinite-state, and so of no 
help. However, we show how to construct another finite-state Markov chain with 
additive rewards, whose expected reward is equal to the expected time of the 
infinite-state chain. This finite-state Markov chain can be exponentially larger 
than the TPWN, and so our algorithm has exponential complexity. We prove 
that computing the expected time is #P-hard, even for free-choice TPWNs in 
which all transitions times are either 1 or 0, and all probabilities are 1 or 1/2. So, 
in particular, the existence of a polynomial algorithm implies P = NP. 

In the rest of the paper we show that, despite these negative results, our 
algorithm behaves well in practice. For all 642 sound free-choice nets of the 
benchmark suite of [12], computing the expected time never takes longer than 
a few milliseconds. Looking for a more complicated set of examples, we study 
a TPWN computed from a set of logs by process mining. We observe that the 
computation of the expected time is sensitive to the distribution of the execution 


156 P. J. Meyer et al. 


time of a task. Still, our experiments show that even for complicated distributions 
leading to TPWNs with hundreds of transitions and times spanning two orders 
of magnitude the expected time can be computed in minutes. 

All missing proofs can be found in the Appendix of the full version [19]. 


2 Preliminaries 


We introduce some preliminary definitions. The full version [19] gives more 
details. 


Workflow Nets. A workflow net is a tuple N = (P,T, F,i,o) where P and T 
are disjoint finite sets of places and transitions; F C (P x T)U (T x P) isa 
set of arcs; io € P are distinguished initial and final places such that i has 
no incoming arcs, o has no outgoing arcs, and the graph (PUT, F U {(0,i)}) is 
strongly connected. For x € PUT, we write °x for the set {y | (y, x) € F} and 
x° for {y | (x,y) € F}. We call °x (resp. x°) the preset (resp. postset) of x. We 
extend this notion to sets X C PUT by °X w Urex®x resp. X° = Urexx®. 
The notions of marking, enabled transitions, transition firing, firing sequence, 
and reachable marking are defined as usual. The initial marking (resp. final 
marking) of a workflow net, denoted by į (resp. o), has one token on place i 
(resp. 0), and no tokens elsewhere. A firing sequence ø is a run if i S o, i.e. if 
it leads to the final marking. Runy denotes the set of all runs of N. 


Soundness and 1-safeness. Well designed workflows should be free of dead- 
locks and livelocks. This idea is captured by the notion of soundness [1,2]: A 
workflow net is sound if the final marking is reachable from any reachable mark- 
ing.! Further, in this paper we restrict ourselves to 1-safe workflows: A marking 
M of a workflow net W is 1-safe if M(p) < 1 for every place p, and W itself is 
1-safe if every reachable marking is 1-safe. We identify 1-safe markings M with 
the set {p € P | M(p) = 1}. 


Independence, concurrency, conflict [22]. Two transitions t1, t2 of a work- 
flow net are independent if °t N °t2 = 0, and dependent otherwise. Given a 1-safe 
marking M, two transitions are concurrent at M if M enables both of them, and 
they are independent, and in conflict at M if M enables both of them, and they 
are dependent. Finally, we recall the definition of Mazurkiewicz equivalence. 
Let N = (P,T, F,i,0) be a 1-safe workflow net. The relation =C T* x T* is 
defined as follows: o =; 7 if there are independent transitions t1, t2 and sequences 
a’,o” € T* such that o = o’ tı t20” and T = o’ ta t10”. Two sequences o,r € T* 
are Mazurkiewicz equivalent if 0 = T, where = is the reflexive and transitive 
closure of =. Observe that o € T* is a firing sequence iff every sequence T = o 
is a firing sequence. 


Confusion-freeness, free-choice workflows. Let t be a transition of a work- 
flow net, and let M be a 1-safe marking that enables t. The conflict set of t 


1t In [2], which examines many different notions of soundness, this is called easy 
soundness. 
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at M, denoted C(t, M), is the set of transitions in conflict with t at M. A 
set U of transitions is a conflict set of M if there is a transition t such that 
U = C(t, M). The conflict sets of M are given by C(M) = UrerC(t,M). A 
1-safe workflow net is confusion-free if for every reachable marking M and every 
transition t enabled at M, every transition u concurrent with t at M satisfies 
C(u, M) = C(u, M \ °t) = C(u, (M \ °t) U t°). The following result follows eas- 
ily from the definitions (see also [11]): 


Lemma 1 [11]. Let N be a 1-safe workflow net. If N is confusion-free then for 
every reachable marking M the conflict sets C(M) are a partition of the set of 
transitions enabled at M. 


A workflow net is free-choice if for every two places p1, po, if p? np$ # 0, then 
pi = p3. Any free-choice net is confusion-free, and the conflict set of a transition 
t enabled at a marking M is given by C(t, M) = (°t)? (see e.g. [11]). 


3 Timed Probabilistic Workflow Nets 


In [11] we introduced a probabilistic semantics for confusion-free workflow nets. 
Intuitively, at every reachable marking a choice between two concurrent tran- 
sitions is resolved nondeterministically by a scheduler, while a choice between 
two transitions in conflict is resolved probabilistically; the probability of choosing 
each transition is proportional to its weight. For example, in the net in Fig. la, at 
the marking {p1, p3}, the scheduler can choose between the conflict sets {t2, t3} 
and {t4}, and if {t2,t3} is chosen, then tz is chosen with probability 1/5 and t3 
with probability 4/5. We extend Probabilistic Workflow Nets by assigning to each 
transition t a natural number 7(t) modeling the time it takes for the transition 
to fire, once it has been selected.” 


Definition 1 (Timed Probabilistic Workflow Nets). A Timed Probabilis- 
tic Workflow Net (TPWN) is a tuple W = (N,w,rT) where N = (P,T, F,i,0) 
is a 1-safe confusion-free workflow net, w: T > Qso is a weight function, and 
T: T > N is a time function that assigns to every transition a duration. 


Timed sequences. We assign to each transition sequence o of W and each place 
pa timestamp u(o)p through a timestamp function u : T* + N?. The set N1 is 
defined by N, = {L}UN with L < z and 1+a = for all x € N1. Intuitively, 
if a place p is marked after ø, then f(a), records the “arrival time” of the token 
in p, and if p is unmarked, then (7), = L. When a transition occurs, it removes 
all tokens in its preset, and r(t) time units later, puts tokens into its postset. 


? The semantics of the model can be defined in the same way for both discrete and 
continuous time, but, since our results only concern discrete time, we only consider 
this case. 
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def def def 


Formally, we define u(e); = 0, (e)p = L for p £ i, and p(ot) = upd(p(o),t), 
where the update function upd : N? xT N? is given by: 


maxges: Lg + T(t) if pe t® 
def 


upd(a,t), = 4 L if p E€ *t\t® 
Ly ifp ¢ tut? 


We then define tm(c) = maxpep H(0)p as the time needed to fire ø. Further 
[z] = {p € P| £p # L} is the marking represented by a timestamp æ € NP. 


Example 1. The net in Fig. 1a is a TPWN. Weights are shown in red next to 
transitions, and times are written in blue into the transitions. For the sequence 
0i = tytg3tats, we have tm(o1) = 9, and for 02 = tıtət3t4t5, we have tm(o2) = 10. 
Observe that the time taken by the sequences is not equal to the sum of the 
durations of the transitions. 


Markov Decision Process semantics. A Markov Decision Process (MDP) is 
a tuple M = (Q, qo, Steps) where Q is a finite set of states, go € Q is the initial 
state, and Steps: Q — 28"@) is the probability transition function. Paths of 
an MDP, schedulers, and the probability measure of paths compatible with a 
scheduler are defined as usual (see the Appendix of the full version [19]). 

The semantics of a TPWN W is a Markov Decision Process MDPy. The 
states of MDP yy are either markings M or pairs (M,t), where t is a transition 
enabled at M. The intended meanings of M and (M, t) are “the current marking 
is M”, and “the current marking is M, and t has been selected to fire next.” 
Intuitively, t is chosen in two steps: first, a conflict set enabled at M is chosen 
nondeterministically, and then a transition of this set is chosen at random, with 
probability proportional to its weight. 

Formally, let W = (N, w,7) be a TPWN where N = (P,T7,F,i,0), let M 
be a reachable marking of W enabling at least one transition, and let C be a 
conflict set of M. Let w(C) be the sum of the weights of the transitions in C. 
The probability distribution Py,c over T is given by Pm,c(t) = ae ftec 
and Py,c(t) = 0 otherwise. Now, let M be the set of 1-safe markings of W, and 
let E be the set of pairs (M,t) such that M € M and M enables t. We define 
the Markov decision process MDPy = (Q, qo, Steps), where Q = MUE, qo =i, 
the initial marking of W, and Steps(M) is defined for markings of M and € as 
follows. For every M € M, 


— if M enables no transitions, then Steps(M) contains exactly one distribution, 
which assigns probability 1 to M, and 0 to all other states. 

— if M enables at least one transition, then Steps(M) contains a distribution A 
for each conflict set C of M. The distribution is defined by: A(M, t) = Pm,c(t) 
for every t € C, and A(s) = 0 for every other state s. 


For every (M,t) € E, Steps(M,t) contains one single distribution that assigns 
probability 1 to the marking M’ such that M Ż M ‘and probability 0 to every 
other state. 
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Fig. 1. A TPWN and its associated MDP. (Color figure online) 


Example 2. Figurelb shows a graphical representation of the MDP of the 
TPWN in Fig. la. Black nodes represent states, white nodes probability distri- 
butions. A black node q has a white successor for each probability distribution 
in Steps(q). A white node has a black successor for each node q such that 
A(q) > 0; the arrow leading to this black successor is labeled with A(q), unless 
A(q) = 1, in which case there is no label. States (M, t) are abbreviated to t. 


Schedulers. Given a TPWN W, a scheduler of MDP w is a function y : T* — 
2T assigning to each firing sequence i 2 M with C(M) # Ø a conflict set 
y(c) € C(M). A firing sequence i = M is compatible with a scheduler y if for 
all partitions o = oi tog for some transition t, we have t € 7(01). 


Example 3. In the TPWN of Fig. 1a, after firing tı two conflict sets become 
concurrently enabled: {t2,t3} and {t4}. A scheduler picks one of the two. If the 
scheduler picks {t2,t3} then t2 may occur, and in this case, since firing t2 does 
not change the marking, the scheduler chooses again one of {t2, t3} and {t4}. So 
there are infinitely many possible schedulers, differing only in how many times 
they pick {t2, t3} before picking t4. 


Definition 2 ((Expected) Time until a state is reached). Let m be an 
infinite path of MDP, and let M be a reachable marking of W. Observe that M 
is a state of MDPy. The time needed to reach M along m, denoted tm(M,r), 
is defined as follows: If n does not visit M, then tm(M,7) = oo; otherwise, 


tm(M, r) = tm(2(n’)), where X(n’) is the transition sequence corresponding to 
the shortest prefix x’ of n ending at M. Given a scheduler S, the expected time 


until reaching M is defined as 
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ET,(M) = XO tm(M,7x)- Prob*(z). 


we Paths’ 


and the expected time ETS, is defined as ETS, “ ET5,(o), i.e. the expected 
time until reaching the final marking. 


In [11] we proved a result for Probabilistic Workflow Nets (PWNs) with 
rewards, showing that the expected reward of a PWN is independent of the 
scheduler (intuitively, this is the case because in a confusion-free Petri net 
the scheduler only determines the logical order in which transitions occur, but 
not which transitions occur). Despite the fact that, contrary to rewards, the 
execution time of a firing sequence is not the sum of the execution times of 
its transitions, the proof carries over to the expected time with only minor 
modifications. 


Theorem 1. Let W be a TPWN. 


(1) There exists a value ETy such that for every scheduler S of W, the expected 
time ET, of W under S is equal to ETy. 
(2) ETy is finite iff W is sound. 


By this theorem, the expected time ETy can be computed by choosing a 
suitable scheduler S, and computing ET$,. 


4 Computation of the Expected Time 


We show how to compute the expected time of a TPWN. We fix an appropriate 
scheduler, show that it induces a finite-state Markov chain, define an appropriate 
reward function for the chain, and prove that the expected time is equal to the 
expected reward. 


4.1 Earliest-First Scheduler 


Consider a firing sequence i 2+ M. We define the starting time of a conflict set 
C € C(M) as the earliest time at which the transitions of C become enabled. 
This occurs after all tokens of °C arrive®, and so the starting time of C is the 
maximum of u(o)p for p € °C (recall that (e)p is the latest time at which a 
token arrives at p while firing ø). 

Intuitively, the “earliest-first” scheduler always chooses the conflict set with 
the earliest starting time (if there are multiple such conflict sets, the scheduler 
chooses any one of them). Formally, recall that a scheduler is a mapping y: T* > 
2T such that for every firing sequence i + M, the set y(c) is a conflict set of 
M. We define the earliest-first scheduler y by: 


y(c) = argmin max p(c)» where M is given by i > M. 


Cec(M) PE°C 


3 This is proved in Lemma 7 in the Appendix of the full version [19]. 
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Example 4. Figure 2a shows the Markov chain induced by the “earliest-first” 
scheduler defined above in the MDP of Fig. 1b. Initially we have a token at i 
with arrival time 0. After firing tı, which takes time 1, we obtain tokens in pı and 
p3 with arrival time 1. In particular, the conflict sets {t2,t3} and {t4} become 
enabled at time 1. The scheduler can choose any of them, because they have the 
same starting time. Assume it chooses {t2,t3}. The Markov chain now branches 
into two transitions, corresponding to firing t2 and t3 with probabilities 1/5 and 
4/5, respectively. Consider the branch in which tz fires. Since tz starts at time 
1 and takes 4 time units, it removes the token from pı at time 1, and adds a 
new token to pı with arrival time 5; the token at p3 is not affected, and it keeps 
its arrival time of 1. So we have p(tıt2) = {% , "8 } (meaning p(tite)p, = 5, 
p(tite)p, = 1, and p(tite), = L otherwise). Now the conflict sets {t2,t3} and 
{t4} are enabled again, but with a difference: while {t4} has been enabled since 
time 1, the set {t2,t3} is now enabled since time u(tıt2)p, = 5. The scheduler 
must now choose {t4}, leading to the marking that puts tokens on pı and p4 
with arrival times ju(titets)p, = 5 and u(titets)p, = 6. In the next steps the 
scheduler always chooses {t2,t3} until ts becomes enabled. The final marking o 
can be reached after time 9, through tyt3t4ts with probability 4/5, or with times 
10 + 4k for k € N, through tytgt4t&tsts with probability (1/s)"*" - 4/5 (the times 
at which the final marking can be reached are written in blue inside the final 
states). 


Theorem 2 below shows that the earliest-first scheduler only needs finite mem- 
ory, which is not clear from the definition. The construction is similar to those 
of [6, 15,16]. However, our proof crucially depends on TPWNs being confusion- 
free. 


Theorem 2. Let H ¥ max;er T(t) be the maximum duration of the transitions 
of T, and let [H] , “ {1,0,1,...,H} C N1. There are functions v: T* > W]? 
(compare with u: T* > N?), f: H] xT => H] and r: H]? — N such that 
for every o = tı ...tn E€ T* compatible with y and for every t € T enabled by o: 


y(o) = argmin max V(O)p (1) 
CEC([r(o)]) PS 
v(ot) = f(v(c),t) (2) 
tm(a) = mag v(o)p + > r(v(ti...tk)) (3) 
k=0 


Observe that, unlike u, the range of v is finite. We call it the finite abstraction 
of u. Equation 1 states that y can be computed directly from the finite abstrac- 
tion v. Equation 2 shows that v(ct) can be computed from v(c) and t. So y only 
needs to remember an element of |H ine which implies that it only requires finite 
memory. Finally, observe that the function r of Eq.3 has a finite domain, and 
so it allows us to use v to compute the time needed by ø. 
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(a) Infinite MC for scheduler using u(o), (b) Finite MC for scheduler using v(c), with 
with final states labeled by tm(o). states labeled by rewards r(v(o)). 


Fig. 2. Two Markov chains for the “earliest-first” scheduler. (Color figure online) 


The formal definition of the functions v, f, and r is given below, together 
with the definition of the auxiliary operator 6: NẸ x N > NÑ: 


def i —n,0) ifap AL 


def 
(zon), = i iesi f(x,t) = upd(a,t) O max £p 


def 


v(e) = ule) and v(ot) = u(t) © max u(o)p re) = Canin, Max wp 


Example 5. Figure2b shows the finite-state Markov chain induced by the 
“earliest-first” scheduler computed using the abstraction v. Consider the fir- 
ing sequence t,t3. We have p(tit3) = 1%, 7 }, ie. the tokens in p> and p3 arrive 
at times 3 and 1, respectively. Now we compute v(t ts), which corresponds to 
the local arrival times of the tokens, i.e. the time elapsed since the last transi- 
tion starts to fire until the token arrives. Transition t3 starts to fire at time 1, 
and so the local arrival times of the tokens in p2 and p3 are 2 and 0, respec- 
tively, i.e. we have v(tit3) = {%, 2 }. Using these local times we compute the 
local starting time of the conflict sets enabled at {p2, p3}. The scheduler always 
chooses the conflict set with earliest local starting time. In Fig. 2b the earliest 
local starting time of the state reached by firing ø, which is denoted r(v(o)), is 
written in blue inside the state. The theorem above shows that this scheduler 
always chooses the same conflict sets as the one which uses the function u, and 
that the time of a sequence can be obtained by adding the local starting times. 


This allows us to consider the earliest local starting time of a state as a reward 
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associated to the state; then, the time taken by a sequence is equal to the sum 
of the rewards along the corresponding path of the chain. For example, we have 
tm(tytotatgts) =O+1+0+44+24+3=10. 

Finally, let us see how v(ot) is computed from v(o) for o = tytgt4 and t = t2. 
We have v(o) = {”} , %}, i.e. the local arrival times for the tokens in pı and p4 
are 4 and 5, respectively. Now {t2, t3} is scheduled next, with local starting time 
r(v(o)) =V(o)p, = 4. If te fires, then, since T(t2) = 4, we first add 4 to the time 
of pı, obtaining fr oe ie Second, we subtract 4 from all times, to obtain the 
time elapsed since tg started to fire (for local times the origin of time changes 


every time a transition fires), yielding the final result v(otz) = 1"), 


4.2 Computation in the Probabilistic Case 


Given a TPWN and its corresponding MDP, in the previous section we have 
defined a finite-state earliest-first scheduler and a reward function of its induced 
Markov chain. The reward function has the following property: the execution 
time of a firing sequence compatible with the scheduler is equal to the sum of 
the rewards of the states visited along it. From the theory of Markov chains with 
rewards, it follows that the expected accumulated reward until reaching a certain 
state, provided that this state is reached with probability 1, can be computed 
by solving a linear equation system. We use this result to compute the expected 
time ETy. 

Let W be a sound TPWN. For every firing sequence o compatible with the 
earliest-first scheduler y, the finite-state Markov chain induced by y contains a 
state x =v(c) € |H li Let C, be the conflict set scheduled by y at x. We define 
a system of linear equations with variables X,, one for each state x: 


Xs =r(a)+ X` 


tEC, 


-X ¢(x,t) if z] £ oO 
(4) 


Xs = if = 

mian Lp if [xz] =o 
The solution of the system is the expected reward of a path leading from 2 to o. 
By the theory of Markov chains with rewards/costs ([4], Chap. 10.5), we have: 


Lemma 2. Let W be a sound TPWN. Then the system of linear equations (4) 
has a unique solution X, and ETy = X v(e). 


Theorem 3. Let W be a TPWN. Then ETy is either oo or a rational number 
and can be computed in single exponential time. 


Proof. We assume that the input has size n and all times and weights are given 
in binary notation. Testing whether W is sound can be done by exploration of 
the state space of reachable markings in time O(2”). If W is unsound, we have 
ETw = 00. 

Now assume that W is sound. By Lemma 2, ETw is the solution to the 
linear equation system (4), which is finite and has rational coefficients, so it is a 
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rational number. The number of variables |X| of (4) is bounded by the size of 
H]? , and as H = max;er T(t) we have |X| < (1+ H)IPl < (1 +20)” < 2%+n, 
The linear equation system can be solved in time O (n? JX P) and therefore in 


time O(2P™)) for some polynomial p. 


5 Lower Bounds for the Expected Time 


We analyze the complexity of computing the expected time of a TPWN. 
Botezano et al. show in [5] that deciding if the expected time exceeds a given 
bound is NP-hard. However, their reduction produces TPWNs with weights and 
times of arbitrary size. An open question is if the expected time can be com- 
puted in polynomial time when the times (and weights) must be taken from a 
finite set. We prove that this is not the case unless P = NP, even if all times 
are 0 or 1, all weights are 1, the workflow net is sound, acyclic and free-choice, 
and the size of each conflict set is at most 2 (resulting only in probabilities 1 or 
1/2). Further, we show that even computing an -approximation is equally hard. 
These two results above are a consequence of the main theorem of this section: 
computing the expected time is #P-hard [23]. For example, counting the num- 
ber of satisfying assignments for a boolean formula (#SAT) is a #P-complete 
problem. Therefore a polynomial-time algorithm for a #P-hard problem would 
imply P = NP. 

The problem used for the reduction is defined on PERT networks [9], in the 
specialized form of two-state stochastic PERT networks [17], described below. 


Definition 3. A two-state stochastic PERT network is a tuple PN = 
(G,s,t,p), where G = (V, E) is a directed acyclic graph with vertices V, rep- 
resenting events, and edges E, representing tasks, with a single source vertex s 
and sink verter t, and where the vector p € QË assigns to each edgee € E a 
rational probability pe € [0,1]. We assume that all pe are written in binary. 
Each edge e € E of PN defines a random variable Xe with distribution 
Pr(Xe = 1) = pe and Pr(X, = 0) = 1—pe. All Xe are assumed to be independent. 
The project duration PD of PN is the length of the longest path in the network 


f 
)2 max > Xe 
AY En 


where IT is the set of paths from vertex s to vertex t. As this defines a random 
variable, the expected project duration of PN is then given by E(PD(PN)). 


Example 6. Figure3a shows a small PERT network (without p), where the 
project duration depends on the paths IT = {e1€3€6, €1€4€7, €2€5€7}. 


The following problem is #P-hard (from [17], using the results from [20]): 


Given: A two-state stochastic PERT network PN. 
Compute: The expected project duration E(PD(PN)). 
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First reduction: 0/1 times, arbitrary weights. We reduce the problem 
above to computing the expected time of an acyclic TPWN with 0/1 times but 
arbitrary weights. Given a two-state stochastic PERT network PN, we construct 
a timed probabilistic workflow net Wpn as follows: 


— For each edge e = (u,v) € E, add the “gadget net” shown in Fig. 3b. Assign 
w(te,o) =1 — Pe, w(te1) = Pe, T T(t e o) = = 0, and T(te, 1) =1. 

~ For each vertex v € V, add a transition t, with arcs from each [e, v] such that 
e = (u,v) € E for some u and arcs to each [v,e] such that e = (v, w) € E for 
some w. Assign w(t,) = 1 and T(t) = 0. 

— Add the place 7 with an arc to t, and the place o with an arc from tz. 


ON aeo |0 1 (0) 1 0 
O le, v] Get i Lo P 

x [u,e] b le, v] O 
(a) PERT network PN. (b) Gadget for e = (u,v) (c) Equivalent gadget for e with 
with rational weights pe, Pe. weights 1 for pe = 5/8 = (0.101),. 


tes,0 


Pes [O ee 
te, ,0 l pai o - teg,0 ies, f 
Pex fo €1,V1| tu, 7E K Pes fo e6, 
s, e1] O% p 7 ae 
Pez |1 rane te4,0 Peg |L ma 
i h T j lease [u3, ee] © re ô 
te1,1 ei 6l O 
(OSa 0 v1, ea] O a 0 |> 
tes,0 Pei B” ter,0 
Pez [O [e2, v2 t Pez JO 
oa: ee 
s, e2] tus 
pe it a tes.0 baere E e i 
tez, m Pes [O a S 
v1, es] OQO m> I 
ps isai 


(d) Timed probabilistic workflow net Wen. 


Fig. 3. A PERT network and its corresponding timed probabilistic workflow net. The 
weight p is short for 1 — p. Transitions without annotations have weight 1. 


The result of applying this construction to the PERT network from Fig. 3a 
is shown in Fig. 3d. It is easy to see that this workflow net is sound, as from 
any reachable marking, we can fire enabled transitions corresponding to the 
edges and vertices of the PERT network in the topological order of the graph, 
eventually firing t; and reaching o. The net is also acyclic and free-choice. 


Lemma 3. Let PN be a two-state stochastic PERT network and let Wpy be its 
corresponding TPWN by the construction above. Then ETwpy = E(PD(PN)). 
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Second reduction: 0/1 times, 0/1 weights. The network constructed this 
way already uses times 0 and 1, however the weights still use arbitrary rational 
numbers. We now replace the gadget nets from Fig. 3b by equivalent nets where 
all transitions have weight 1. The idea is to use the binary encoding of the 
probabilities pe, deciding if the time is 0 or 1 by a sequence of coin flips. We 
assume that pe = ye 2~‘n; for some k € N and p; € {0,1} for 0 < i < k. The 
replacement is shown in Fig. 3c for pe = 5/8 = (0.101),. 


Approximating the expected time is #P-hard. We show that computing 
an e-approximation for ETw is #P-hard [17,20]. 


Theorem 4. The following problem is #P-hard: 


Given: A sound, acyclic and free-choice TPWN W where all transitions 
t satisfy w(t) = 1, T(t) € {0,1} and |(°t)°| < 2, and ane > 0. 
Compute: A rational r such that r — e< ETw <r+e. 


6 Experimental Evaluation 


We have implemented our algorithm to compute the expected time of a TPWN 
as a package of the tool ProMt. It is available via the package manager of the 
latest nightly build under the package name WorkflowNetAnalyzer. 

We evaluated the algorithm on two different benchmarks. All experiments in 
this section were run on the same machine equipped with an Intel Core i7-6700K 
CPU and 32 GB of RAM. We measure the actual runtime of the algorithm, split 
into construction of the Markov chain and solving the linear equation system, 
and exclude the time overhead due to starting ProM and loading the plugin. 


6.1 IBM Benchmark 


We evaluated the tool on a set of 1386 workflow nets extracted from a collection 
of five libraries of industrial business processes modeled in the IBM WebSphere 
Business Modeler [12]. All of the 1386 nets in the benchmark libraries are free- 
choice and therefore confusion-free. We selected the sound and 1-safe nets among 
them, which are 642 nets. Out of these, 409 are marked graphs, i.e. the size of 
any conflict set is 1. Out of the remaining 233 nets, 193 are acyclic and 40 cyclic. 

As these nets do not come with probabilistic or time information, we anno- 
tated transitions with integer weights and times chosen uniformly from different 
intervals: (1) w(t) = T(t) = 1, (2) w(t), T(t) € [1, 10°] and (3) w(t), r(t) € [1, 10°]. 
For each interval, we annotated the transitions of each net with random weights 
and times, and computed the expected time of all 642 nets. 

For all intervals, we computed the expected time for any net in less than 
50 ms. The analysis time did not differ much for different intervals. The solving 
time for the linear equation system is on average 5% of the total analysis time, 


4 http://www.promtools.org/. 
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and at most 68%. The results for the nets with the longest analysis times are 
given in Table 1. They show that even for nets with a huge state space, thanks 
to the earliest-first scheduler, only a small number of reachable markings is 
explored. 


Table 1. Analysis times and size of the state space |X| for the 4 nets with the highest 
analysis times, given for each of the three intervals [1], [10%], [10°] of possible times. 
Here, [RY | denotes the number of reachable markings of the net. 


Net Net info & size Analysis time (ms) |X| 


cyclic |P] |T| |R| [1] [0°] [20°] [a] [10°] [10°] 
m1.s30.s703 no 264 286 6117 40.3 44.6 43.8 304 347 347 
m1.s30.s596 yes 214 230 623 21.6 24.4 23.6 208 232 234 
b3.8371_s1986 no 235 101 2-10!” 16.8 16.4 16.5 101 102 102 
b2.s275-s2417 no 103 68 237626 14.2 17.8 15.9 355 460 431 


6.2 Process Mining Case Study 


As a second benchmark, we evaluated the algorithm on a model of a loan appli- 
cation process. We used the data from the BPI Challenge 2017 [8], an event log 
containing 31509 cases provided by a financial institute, and took as a model 
of the process the final net from the report of the winner of the academic cate- 
gory [21], a simple model with high fitness and precision w.r.t. the event log. 


W_Handle leads O_Create Offer 


87% 
A-Create 64.8% p| 1.1h 14d} 
i Application A_Concep 
v 2G 
O Oh — © 20.1 ms Of oo 13% Crlb 
35.2% — E O—t6a O J 
A_Complete W-.Complete application 
19.4 ms 
40.9% i 
g | W -Validate E 10 A-Pending 
aaa 5.1% 
54.6% | application >| 2d 
1.5% "He O ee O A-Denied 4 
` O-Create Offer 93.8% > estar O o 
10), 9.5h LI% x 
3% W-Call incomplete files A-Cancelled 
— 25.2d 


Fig. 4. Net from [21] of process for personal loan applications in a financial institute, 
annotated with mean waiting times and local trace weights. Black transitions are invis- 
ible transitions not appearing in the event log with time 0. 
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Table 2. Expected time, analysis time and state space size for the net in Fig. 4 for 
various distributions, where memout denotes reaching the memory limit. 


Distribution IT| ETw |X| Analysis time 

Total Construction Solving 
Deterministic 19 24d th 33 40ms 18ms 22ms 
Histogram/12h 141 24d 18h 4054 244ms 232ms 12ms 
Histogram/6h 261 24d 21h 15522 2.18 1.858 0.38 
Histogram/4h 375 24d 22h 34063 10s 6s 4s 
Histogram/2h 666 24d 23h 122785 346s 52s 294s 
Histogram/1h 1117 — 422614 — 12.7min memout 


Using the ProM plugin “Multi-perspective Process Explorer” [18] we anno- 
tated each transition with waiting times and each transition in a conflict set 
with a local percentage of traces choosing this transition when this conflict set 
is enabled. The net with mean times and weights as percentages is displayed in 
Fig. 4. 

For a first analysis, we simply set the execution time of each transition deter- 
ministically to its mean waiting time. However, note that the two transitions 
“O_Create Offer” and “W_Complete application” are executed in parallel, and 
therefore the distribution of their execution times influences the total expected 
time. Therefore we also annotated these two transitions with a histogram of 
possible execution times from each case. Then we split them up into multiple 
transitions by grouping the times into buckets of a given interval size, where 
each bucket creates a transition with an execution time equal to the beginning 
of the interval, and a weight equal to the number of cases with a waiting time 
contained in the interval. The times for these transitions range from 6 ms to 31 
days. As bucket sizes we chose 12,6,4,2 and 1 hour(s). The net always has 14 
places and 15 reachable markings, but a varying number of transitions depend- 
ing on the chosen bucket size. For the net with the mean as the deterministic 
time and for the nets with histograms for each bucket size, we then analyzed the 
expected execution time using our algorithm. 

The results are given in Table2. They show that using the complete distri- 
bution of times instead of only the mean can lead to much more precise results. 
When the linear equation system becomes very large, the solver time dominates 
the construction time of the system. This may be because we chose to use an 
exact solver for sparse linear equation systems. In the future, this could possibly 
be improved by using an approximative iterative solver. 


7 Conclusion 


We have shown that computing the expected time to termination of a proba- 
bilistic workflow net in which transition firings have deterministic durations is 
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#P-hard. This is the case even if the net is free-choice, and both probabilities 
and times can be written down with a constant number of bits. So, surprisingly, 
computing the expected time is much harder than computing the expected cost, 
for which there is a polynomial algorithm [11]. 

We have also presented an exponential algorithm for computing the expected 
time based on earliest-first schedulers. Its performance depends crucially on the 
maximal size of conflict sets that can be concurrently enabled. In the most 
popular suite of industrial benchmarks this number turns out to be small. So, 
very satisfactorily, the expected time of any of these benchmarks, some of which 
have hundreds of transitions, can still be computed in milliseconds. 


Acknowledgements. We thank Hagen Völzer for input on the implementation and 
choice of benchmarks. 


References 


1. van der Aalst, W.M.P.: The application of Petri nets to workflow manage- 
ment. J. Circ. Syst. Comput. 8(1), 21-66 (1998). https://doi.org/10.1142/ 
$0218126698000043 

2. van der Aalst, W.M.P., et al.: Soundness of workflow nets: classification, decidabil- 
ity, and analysis. Formal Asp. Comput. 23(3), 333-363 (2011). https: //doi.org/10. 
1007 /s00165-010-0161-4 

3. van der Aalst, W., van Hee, K.M.: Workflow Management: Models, Methods, and 
Systems. MIT Press, Cambridge (2004) 

4. Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge 
(2008) 

5. Botezatu, M., Völzer, H., Thiele, L.: The complexity of deadline analysis for work- 
flow graphs with multiple resources. In: La Rosa, M., Loos, P., Pastor, O. (eds.) 
BPM 2016. LNCS, vol. 9850, pp. 252-268. Springer, Cham (2016). https://doi. 
org/10.1007/978-3-319-45348-4_15 

6. Carlier, J., Chrétienne, P.: Timed Petri net schedules. In: Rozenberg, G. (ed.) APN 
1987. LNCS, vol. 340, pp. 62-84. Springer, Heidelberg (1988). https: //doi.org/10. 
1007 /3-540-50580-6_24 

7. Desel, J., Erwin, T.: Modeling, simulation and analysis of business processes. In: 
van der Aalst, W., Desel, J., Oberweis, A. (eds.) Business Process Management. 
LNCS, vol. 1806, pp. 129-141. Springer, Heidelberg (2000). https://doi.org/10. 
1007 /3-540-45594-9 9 

8. van Dongen, B.F.: BPI Challenge 2017 (2017). https://doi.org/10.4121/uuid: 
5f3067df-f10b-45da-b98b-86ae4c7a310b 

9. Elmaghraby, S.E.: Activity Networks: Project Planning and Control by Network 
Models. Wiley, Hoboken (1977) 

10. Esparza, J., Hoffmann, P.: Reduction rules for colored workflow nets. In: Stevens, 
P., Wasowski, A. (eds.) FASE 2016. LNCS, vol. 9633, pp. 342-358. Springer, Hei- 
delberg (2016). https: //doi.org/10.1007/978-3-662-49665-7_20 

11. Esparza, J., Hoffmann, P., Saha, R.: Polynomial analysis algorithms for free choice 
probabilistic workflow nets. Perform. Eval. 117, 104-129 (2017). https://doi.org/ 
10.1016/j.peva.2017.09.006 


170 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


P. J. Meyer et al. 


Fahland, D., et al.: Instantaneous soundness checking of industrial business process 
models. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, 
vol. 5701, pp. 278-293. Springer, Heidelberg (2009). https://doi.org/10.1007/978- 
3-642-03848-8_19 

Favre, C., Fahland, D., Volzer, H.: The relationship between workflow graphs and 
free-choice workflow nets. Inf. Syst. 47, 197-219 (2015). https: //doi.org/10.1016/ 
j-is.2013.12.004 

Favre, C., Völzer, H., Müller, P.: Diagnostic information for control-flow analysis of 
workflow graphs (a.k.a. free-choice workflow nets). In: Chechik, M., Raskin, J.-F. 
(eds.) TACAS 2016. LNCS, vol. 9636, pp. 463-479. Springer, Heidelberg (2016). 
https: //doi.org/10.1007/978-3-662-49674-9_27 

Gaubert, S., Mairesse, J.: Asymptotic analysis of heaps of pieces and application to 
timed Petri nets. In: Proceedings of the 8th International Workshop on Petri Nets 
and Performance Models, PNPM 1999, Zaragoza, Spain, 8-10 September 1999, pp. 
158-169 (1999). https://doi.org/10.1109/PNPM.1999.796562 

Gaubert, S., Mairesse, J.: Modeling and analysis of timed Petri nets using heaps 
of pieces. IEEE Trans. Autom. Control 44(4), 683-697 (1999). https: //doi.org/10. 
1109/9.754807 

Hagstrom, J.N.: Computational complexity of PERT problems. Networks 18(2), 
139-147 (1988). https: //doi.org/10.1002/net.3230180206 

Mannhardt, F., de Leoni, M., Reijers, H.A.: The multi-perspective process explorer. 
In: Proceedings of the BPM Demo Session 2015 Co-located with the 13th Interna- 
tional Conference on Business Process Management, BPM 2015, Innsbruck, Aus- 
tria, 2 September 2015, pp. 130-134 (2015) 

Meyer, P.J., Esparza, J., Offtermatt, P.: Computing the expected execution time 
of probabilistic workflow nets. arXiv:1811.06961 [cs.LO] (2018) 

Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the 
probability that a graph is connected. SIAM J. Comput. 12(4), 777—788 (1983). 
https: //doi.org/10.1137/0212053 

Rodrigues, A., et al: Stairway to value: mining a loan application pro- 
cess (2017). https: //www.win.tue.nl/bpi/lib/exe/fetch.php?media=2017:bpi2017_ 
winner_academic.pdf 

Rozenberg, G., Thiagarajan, P.S.: Petri nets: basic notions, structure, behaviour. 
In: de Bakker, J.W., de Roever, W.-P., Rozenberg, G. (eds.) Current Trends in 
Concurrency. LNCS, vol. 224, pp. 585-668. Springer, Heidelberg (1986). https:// 
doi.org/10.1007/BFb0027048 

Valiant, L.G.: The complexity of computing the permanent. Theoret. Comput. Sci. 
8, 189-201 (1979). https: //doi.org/10.1016/0304-3975(79)90044-6 

Meyer, P.J., Esparza, J., Offtermatt, P.: Artifact and instructions to generate 
experimental results for TACAS 2019 paper: Computing the Expected Execution 
Time of Probabilistic Workflow Nets (artifact). Figshare (2019). https://doi.org/ 
10.6084/m9.figshare.7831781.v1 


Computing the Expected Execution Time of Probabilistic Workflow Nets 171 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


®) 


Check for 
updates 


Shepherding Hordes of Markov Chains 


Milan Cegka!, Nils Jansen”, Sebastian Junges?®), and Joost-Pieter Katoen® 


1 Brno University of Technology, Brno, Czech Republic 
? Radboud University, Nijmegen, The Netherlands 
3 RWTH Aachen University, Aachen, Germany 


sebastian. junges@cs.rwth-aachen.de 


Abstract. This paper considers large families of Markov chains (MCs) 
that are defined over a set of parameters with finite discrete domains. 
Such families occur in software product lines, planning under partial 
observability, and sketching of probabilistic programs. Simple questions, 
like ‘does at least one family member satisfy a property?’, are NP-hard. 
We tackle two problems: distinguish family members that satisfy a given 
quantitative property from those that do not, and determine a family 
member that satisfies the property optimally, i.e., with the highest prob- 
ability or reward. We show that combining two well-known techniques, 
MDP model checking and abstraction refinement, mitigates the compu- 
tational complexity. Experiments on a broad set of benchmarks show that 
in many situations, our approach is able to handle families of millions of 
MCs, providing superior scalability compared to existing solutions. 


1 Introduction 


Randomisation is key to research fields such as dependability (uncertain sys- 
tem components), distributed computing (symmetry breaking), planning (unpre- 
dictable environments), and probabilistic programming. Families of alternative 
designs differing in the structure and system parameters are ubiquitous. Software 
dependability has to cope with configuration options, in distributed computing 
the available memory per process is highly relevant, in planning the observabil- 
ity of the environment is pivotal, and program synthesis is all about selecting 
correct program variants. The automated analysis of such families has to face 
a formidable challenge—in addition to the state-space explosion affecting each 
family member, the family size typically grows exponentially in the number of 
features, options, or observations. This affects the analysis of (quantitative) soft- 
ware product lines [18,28,43,45,46], strategy synthesis in planning under partial 
observability [12,14,29, 36,41], and probabilistic program synthesis [9, 13,27, 40]. 

This paper considers families of Markov chains (MCs) to describe config- 
urable probabilistic systems. We consider finite MC families with finite-state 
family members. Family members may have different transition probabilities 
and distinct topologies—thus different reachable state spaces. The latter aspect 
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goes beyond the class of parametric MCs as considered in parameter synthe- 
sis [10,22, 24,31] and model repair [6, 16,42]. 

For an MC family © and quantitative specification y, with y a reachability 
probability or expected reward objective, we consider the following synthesis 
problems: (a) does some member in D satisfy a threshold on y? (aka: feasibility 
synthesis), (b) which members of D satisfy this threshold on y and which ones 
do not? (aka: threshold synthesis), and (c) which family member(s) satisfy » 
optimally, e.g., with highest probability? (aka: optimal synthesis). 

The simplest synthesis problem, feasibility, is NP-complete and can naively 
be solved by analysing all individual family members—the so-called one-by-one 
approach. This approach has been used in [18] (and for qualitative systems in e.g. 
[19]), but is infeasible for large systems. An alternative is to model the family D 
by a single Markov decision process (MDP)—the so-called all-in-one MDP [18]. 
The initial MDP state non-deterministically chooses a family member of D, and 
then evolves in the MC of that member. This approach has been implemented 
in tools such as ProFeat [18], and for purely qualitative systems in [20]. The 
MDP representation avoids the individual analysis of all family members, but 
its size is proportional to the family size. This approach therefore does not scale 
to large families. A symbolic BDD-based approach is only a partial solution as 
family members may induce different reachable state-sets. 

This paper introduces an abstraction-refinement scheme over the MDP repre- 
sentation!. The abstraction forgets in which family member the MDP operates. 
The resulting quotient MDP has a single representative for every reachable state 
in a family member. It typically provides a very compact representation of the 
family and its analysis using off-the-shelf MDP model-checking algorithms 
yields a speed-up compared to the all-in-one approach. Verifying the quotient 
MDP yields under- and over-approximations of the min and max probability 
(or reward), respectively. These bounds are safe as all consistent schedulers, i.e., 
those that pick actions according to a single family member, are contained in all 
schedulers considered on the quotient MDP. (CEGAR-based MDP model check- 
ing for partial information schedulers, a slightly different notion than restricting 
schedulers to consistent ones, has been considered in [30]. In contrast to our 
setting, [30] considers history-dependent schedulers and in this general setting 
no guarantee can be given that bounds on suprema converge [29]). 

Model-checking results of the quotient MDP do provide useful insights. This 
is evident if the resulting scheduler is consistent. If the verification reveals that 
the min probability exceeds r for a specification y with a < r threshold, then— 
even for inconsistent schedulers—it holds that all family members violate y. If 
the model checking is inconclusive, i.e., the abstraction is too coarse, we iter- 
atively refine the quotient MDP by splitting the family into sub-families. We 
do so in an efficient manner that avoids rebuilding the sub-families. Refinement 
employs a light-weight analysis of the model-checking results. 


1 Classical CEGAR for model checking of software product lines has been proposed 
in [21]. This uses feature transition systems, is purely qualitative, and exploits exis- 
tential state abstraction. 
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We implemented our abstraction-refinement approach using the Storm model 
checker [25]. Experiments with case studies from software product lines, plan- 
ning, and distributed computing yield possible speed-ups of up to 3 orders of 
magnitude over the one-by-one and all-in-one approaches (both symbolic and 
explicit). Some benchmarks include families of millions of MCs where family 
members are thousands of states. The experiments reveal that—as opposed to 
parameter synthesis [10,24,31|—the threshold has a major influence on the syn- 
thesis times. 

To summarise, this work presents: (a) MDP-based abstraction-refinement for 
various synthesis problems over large families of MCs, (b) a refinement strategy 
that mitigates the overhead of analysing sub-families, and (c) experiments show- 
ing substantial speed-ups for many benchmarks. Extra material can be found 
in [1,11]. 


2 Preliminaries 
We present the basic foundations for this paper, for details, we refer to [4,5]. 


Probabilistic models. A probability distribution over a finite or countably infinite 
set X is a function p: X — [0,1] with $ ex u(x) = (X) = 1. The set of 
all distributions on X is denoted Distr(X). The support of a distribution ju is 
supp(u) = {x € X | u(x) > 0}. A distribution is Dirac if |supp()| = 1. 


Definition 1 (MC). A discrete-time Markov chain (MC) D is a triple 
(S,50,P), where S is a finite set of states, so E€ S is an initial state, and 
P: S > Distr(S) is a transition probability matriz. 


MCs have unique distributions over successor states at each state. Adding non- 
deterministic choices over distributions leads to Markov decision processes. 


Definition 2 (MDP). A Markov decision process (MDP) is a tuple M = 
(S, 59, Act, P) where S,s 9 as in Definition 1, Act is a finite set of actions, and 
P: Sx Act » Distr(S) is a partial transition probability function. 


The available actions in s E€ S are Act(s) = {a € Act | P(s,a) # L}. An 
MDP with |Act(s)| = 1 for all s € S is an MC. For MCs (and MDPs), a state- 
reward function is rew: S — Rso. The reward rew(s) is earned upon leaving s. 

A path of an MDP M is an (in)finite sequence 7 = sp “> sı “+ ---, where 
s; E S, a; E Act(s;), and P(s;,a;)(s;41) Æ 0 for all i € N. For finite 7, last(z) 
denotes the last state of m. The set of (in)finite paths of M is Paths% (Paths™). 
The notions of paths carry over to MCs (actions are omitted). Schedulers resolve 
all choices of actions in an MDP and yield MCs. 


Definition 3 (Scheduler). A scheduler for an MDP M = (S, so, Act, P) is a 
function o: Paths‘, — Act such that o(7) € Act(last(m)) for all 7 € Paths‘. 
Scheduler o is memoryless if last(7) = last(a’) = > a(n) = a(n’) for all 
T, T E Paths% . The set of all schedulers of M is X™. 
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Definition 4 (Induced Markov Chain). The MC induced by MDP M and 
a € XM is given by Mo = (Paths#,, 80, P7) where: 


! Pil aoln) oy 
P°(n,1!) = eae ee 


0 otherwise. 


Specifications. For a MC D, we consider unbounded reachability specifications 
of the form y = Pr,(OG) with G C S a set of goal states, A € [0,1] C R, 
and ~ € {<,<,>,>}. The probability to satisfy the path formula ¢ = 0G 
in D is denoted by Prob(D,¢). If y holds for D, that is, Prob(D,¢) ~ A, we 
write D } y. Analogously, we define expected reward specifications of the form 
p = Ex.(0G) with x € Rso. We refer to A/K as thresholds. While we only 
introduce reachability specifications, our approaches may be extended to richer 
logics like arbitrary PCTL [32], PCTL* [3], or w-regular properties. 

For an MDP M, a specification p holds (M |= vy) if and only if 
it holds for the induced MCs of all schedulers. The maximum probability 
Prob™**(M,@) to satisfy a path formula ¢ for an MDP M is given by a max- 
imising scheduler o™?* € XM, that is, there is no scheduler o’ € XM such 
that Prob( Momax, ġ) < Prob(M,,¢). Analogously, we define the minimising 
probability Prob™"(M,¢), and the maximising (minimising) expected reward 
ExpRew™**(M, p) (ExpRew™™(M, ¢)). 

The probability (expected reward) to satisfy path formula ¢ from state s € 
S in MC D is Prob(D, ¢)(s) (ExpRew(D, ¢)(s)). The notation is analogous for 
maximising and minimising probability and expected reward measures in MDPs. 
Note that the expected reward ExpRew(D,) to satisfy path formula ¢ is only 
defined if Prob(D,¢) = 1. Accordingly, the expected reward for MDP M under 
scheduler o € ©“ requires Prob(M,,¢) = 1. 


3 Families of MCs 


We present our approaches on the basis of an explicit representation of a fam- 
ily of MCs using a parametric transition probability function. While arbitrary 
probabilistic programs allow for more modelling freedom and complex parameter 
structures, the explicit representation alleviates the presentation and allows to 
reason about practically interesting synthesis problems. In our implementation, 
we use a more flexible high-level modelling language, cf. Sect. 5. 


Definition 5 (Family of MCs). A family of MCs is defined as a tuple D = 
(S, so, K, 8) where S is a finite set of states, so E S is an initial state, K is a 
finite set of discrete parameters such that the domain of each parameter k € K 
is Te C S, and P: S > Distr(K) is a family of transition probability matrices. 


The transition probability function of MCs maps states to distributions over 
successor states. For families of MCs, this function maps states to distributions 
over parameters. Instantiating each of these parameters with a value from its 
domain yields a “concrete” MC, called a realisation. 
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Fig. 1. The four different realisations of D. 


Definition 6 (Realisation). A realisation of a family D = (S,s0,K,B) is a 
function r: K — S where Vk € K: r(k) € Ty. A realisation r yields a MC 
D, = (S, so, B(r)), where P(r) is the transition probability matrix in which each 
kE K in is replaced by r(k). Let R® denote the set of all realisations for D. 


As a family D of MCs is defined over finite parameter domains, the number of 
family members (i.e. realisations from R®) of D is finite, viz. |D| := |R®| = 
Trex |Tk|, but exponential in |K]. Subsets of R? induce so-called subfamilies 
of D. While all these MCs share the same state space, their reachable states may 
differ, as demonstrated by the following example. 


Example 1 (Family of MCs). Consider a family of MCs D = (S, so, K, 8) where 
S = {0,1,2,3}, so = 0, and K = {ko, kı, k2} with domains Tko = {0}, Tk, = 
{0,1}, and Tk, = {2,3}. The parametric transition function is defined by: 
B(0) = 0.5: ko + 0.5: kı P(1) = 0.5: ky + 0.5: ke 
P2) = 1: kə P(3) = 0.5: ky +0.5: kə 


Figure 1 shows the four MCs that result from the realisations {r1, r2,r3, r4} = 
R® of D. States that are unreachable from the initial state are greyed out. 


We state two synthesis problems for families of MCs. The first is to identify the 
set of MCs satisfying and violating a given specification, respectively. The second 
is to find a MC that maximises/minimises a given objective. We call these two 
problems threshold synthesis and maz/min synthesis. 


Problem 1 (Threshold synthesis). Let D be a family of MCs and p a prob- 
abilistic reachability or expected reward specification. The threshold synthesis 
problem is to partition R? into T and F such that Yr € T: D, F y and 
Yre F: D Fg. 


As a special case of the threshold synthesis problem, the feasibility synthesis 
problem is to find just one realisation r € R® such that D, F ọ. 
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Problem 2 (Max synthesis). Let D a family of MCs and 6 = QG for 
G C S. The max synthesis problem is to find a realisation r* € R? such that 
Prob(D,»,¢) = max,ero{Prob(D,,¢)}. The problem is defined analogously for 
an expected reward measure or minimising realisations. 


Example 2 (Synthesis problems). Recall the family of MCs D from Example 1. 
For the specification y = Pso.1(O{1}), the solution to the threshold synthesis 
problem is T = {r2,r3} and F = {r1,7r4}, as the goal state 1 is not reachable for 
D,, and D,,. For ¢ = {1}, the solution to the max synthesis problem on D is 
rg or r3, as Dr, and Dp, have probability one to reach state 1. 


Approach 1 (One-by-one [18]). A straightforward solution to both synthesis 
problems is to enumerate all realisations r € R®, model check the MCs D,, and 
either compare all results with the given threshold or determine the maximum. 


We already saw that the number of realisations is exponential in |K]. 
Theorem 1. The feasibility synthesis problem is NP-complete. 


The theorem even holds for almost-sure reachability properties. The proof is a 
straightforward adaption of results for augmented interval Markov chains [17, 
Theorem 3], partial information games [15], or partially observable MDPs [14]. 


4 Guided Abstraction-Refinement Scheme 


In the previous section, we introduced the notion of a family of MCs, two syn- 
thesis problems and the one-by-one approach. Yet, for a sufficiently high number 
of realisations such a straightforward analysis is not feasible. We propose a novel 
approach allowing us to more efficiently analyse families of MCs. 


4.1 All-in-one MDP 


We first consider a single MDP that subsumes all individual MCs of a family 9, 
and is equipped with an appropriate action and state labelling to identify the 
underlying realisations from R®. 


Definition 7 (All-in-one MDP [18,28,43]). The all-in-one MDP of a family 
D = (S,s0,K,) of MCs is given as M? = (S®,s2, Act®,P®) where S® = 
Sx R? U{s2}, Act? = {a" |r € R®}, and P® is defined as follows: 


P®(sq,a")((so,7)) =1 and P®((s,r),a")((s',7)) = P(r)(s)(s’). 


Example 3 (All-in-one MDP). Figure 2 shows the all-in-one MDP M? for the 
family D of MCs from Example 1. Again, states that are not reachable from the 
initial state sẸ are marked grey. For the sake of readability, we only include the 
transitions and states that correspond to realisations rı and ro. 
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Fig. 2. Reachable fragment of the all-in-one MDP M ® for realisations rı and ro. 


From the (fresh) initial state s® of the MDP, the choice of an action a, cor- 
responds to choosing the realisation r and entering the concrete MC D,. This 
property of the all-in-one MDP is formalised as follows. 


Corollary 1. For the all-in-one MDP M? of family D of MCs?: 
{M®. | o” memoryless deterministic scheduler} = {D, |r € R®}. 


Consequently, the feasibility synthesis problem for y has the solution r € R® iff 
there exists a memoryless deterministic scheduler go” such that M2. E y. 


Approach 2 (All-in-one [18]). Model checking the all-in-one MDP determines 
max or min probability (or expected reward) for all states, and thereby for all 
realisations, and thus provides a solution to both synthesis problems. 


As also the all-in-one MDP may be too large for realistic problems, we merely 
use it as formal starting point for our abstraction-refinement loop. 


4.2 Abstraction 
First, we define a predicate abstraction that at each state of the MDP forgets in 


which realisation we are, i.e., abstracts the second component of a state (s, r). 
Definition 8 (Forgetting). Let M? = (82, sẸ, Act”, P?) be an all-in-one 
MDP. Forgetting is an equivalence relation ~p C S® x S? satisfying 


(s r) ~g (3,r) —> s=s! and s ~; (89, r) Yr E€ R®. 


Let |s], denote the equivalence class wrt. ~p containing state s € S®. 
Forgetting induces the quotient MDP M2 = (S®,[s®]., Act® , P2), where 

PÈ ([sl~, ar)(ls] ~) = B(r)(s)(s’). 

At each state of the quotient MDP, the actions correspond to any realisation. It 

includes states that are unreachable in every realisation. 


Remark 1 (Action space). According to Definition 8, for every state [s]~ there are 
|D| actions. Many of these actions lead to the same distributions over successor 
states. In particular, two different realisations r and r’ lead to the same distribu- 
tion in s if r(k) = r'(k) for all k € K where B(s)(k) 4 0. To avoid this spurious 
blow-up of actions, we a-priori merge all actions yielding the same distribution. 


? The original initial state so of the family of MCs needs to be the initial state of M£. 
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Fig. 3. The quotient MDP M® for realisations rı and ro. 


The quotient MDP under forgetting involves that the available actions allow to 
switch realisations and thereby create induced MCs different from any MC in 9. 
We formalise the notion of a consistent realisation with respect to parameters. 


Definition 9 (Consistent realisation). For a family D of MCs andk ec K, 
k-realisation-consistency is an equivalence relation ~, C RE XR? satisfying: 


r xp T 4> r(k) = r'(k). 
Let [r]z, denote the equivalence class w.r.t. ~p containing r € R®. 


Definition 10 (Consistent scheduler). For quotient MDP M® after forget- 
ting and k € K, a scheduler o € MŽ is k-consistent if for alln, n € Paths iy~ $ 


olt) =ar Aalt) =ar = r apt. 
A scheduler is K-consistent (short: consistent) if it is k-consistent for allk € K. 
Lemma 1. For the quotient MDP M® of family D of MCs: 
{[(M2) | o™ consistent scheduler} = {D, | r € R?}. 


Proof (Idea). For o” € XM? we construct 0” € XM? such that ao™ ([s]~) = ar 
for all s. Clearly o” is consistent and M2. = (M2 Jo . is obtained via a map 


between (s,r) and [s]~. For o”” € SM3, we construct o” € XM? such that if 
oa” ([s]~) = a, then o”(s®) = a,. For all other states, we define o”((s,r’)) = a” 
independently of o". Then M2. = (M2), « is obtained as above. 


or ir 


The following theorem is a direct corollary: we need to consider exactly the 
consistent schedulers. 


Theorem 2. For all-in-one MDP M? and specification ọ, there exists a mem- 
oryless deterministic scheduler o” € XM” such that M®. E o iff there exists a 
consistent deterministic scheduler o™ € XM? such that (M2) E y. 
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Example 4. Recall the all-in-one MDP M? from Example 3. The quotient MDP 
M® is depicted in Fig. 3. Only the transitions according to realisations rı and 
rg are included. Transitions from previously unreachable states, marked grey in 
Example 3, are now available due to the abstraction. The scheduler ø € yM? 
with o([s?]~) = ar, and o([1]~) = a,, is not kı-consistent as different values 
are chosen for kı by rı and r2. In the MC M2, induced by ø and M®, the 
probability to reach state [2]. is one, while under realisation rı, state 2 is not 
reachable. 


Approach 3 (Scheduler iteration). Enumerating all consistent schedulers 
for M® and analysing the induced MC provides a solution to both synthesis 
problems. 


However, optimising over exponentially many consistent schedulers solves the 
NP-complete feasibility synthesis problem, rendering such an iterative approach 
unlikely to be efficient. Another natural approach is to employ solving techniques 
for NP-complete problems, like satisfiability modulo linear real arithmetic. 


Approach 4 (SMT). A dedicated SMT-encoding (in [11]) of the induced MCs 
of consistent schedulers from MÊ? that solves the feasibility problem. 


4.3 Refinement Loop 


Although iterating over consistent schedulers (Approach 3) is not feasible, model 
checking of MÊ still provides useful information for the analysis of the family D. 
Recall the feasibility synthesis problem for p = P<)(¢). If Prob™*(M2, ¢) < A, 
then all realisations of D satisfy y. On the other hand, Prob™"(M2,¢) > A 
implies that there is no realisation satisfying y. If A lies between the min and 
max probability, and the scheduler inducing the min probability is not consistent, 
we cannot conclude anything yet, i.e., the abstraction is too coarse. A natural 
countermeasure is to refine the abstraction represented by M2, in particular, 


split the set of realisations leading to two synthesis sub-problems. 


Definition 11 (Splitting). Let D be a family of MCs, and R C R? a set of 
realisations. For k € K and predicate Ay, over S, splitting partitions R into 


Rr ={rER|Axg(r(k))} and Ri ={reR]| AA, (r(k))}. 


Splitting the set of realisations, and considering the subfamilies separately, rather 
than splitting states in the quotient MDP, is crucial for the performance of the 
synthesis process as we avoid rebuilding the quotient MDP in each iteration. 
Instead, we only restrict the actions of the MDP to the particular subfamily. 


Definition 12 (Restricting). Let M2 = (S2,[s®]., Act®,P®) be a quotient 
MDP and R C R® a set of realisations. The restriction of M2 wrt. R is the 
MDP M2{R] = (S®,[s?]~, Act? [R], PZ) where Act? [R] = {ar | r € R}.3 


3 Naturally, P2 in M2[R] is restricted to Act? [R]. 
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Algorithm 1. Threshold synthesis 


Input: A family D of MCs with the set R® of realisations, and specification P<,(¢) 
Output: A partition of R® into subsets T and F according to Problem 1. 
F-@,T<—0, U = {RP} 
MÊ — buildQuotientMDP(D, R?, ~p) > Applying Def. 7 and 8 
while U 49 do 

select R € U and U — U \ {R} 

MÊ [R] — restrict(M2,R) > Applying Def. 12 

(max, Cmax) — solveMaxMDP( M£ [R], ¢) 

(min, amin) — solveMinMDP(M2[R], ¢) 

if max < à then T=- TUR 

if min > à then F— FUR 
10: if min < A < max then 
11: U + U Usplit(R, selPredicate(max, Cmax, MİN, Gmin )) > See Sect. 4.4 
12: return T, F 


S 


The splitting operation is the core of the proposed abstraction-refinement. Due 
to space constraints, we do not consider feasibility separately. 

Algorithm 1 illustrates the threshold synthesis process. Recall that the goal is 
to decompose the set R? into realisations satisfying and violating a given spec- 
ification, respectively. The algorithm uses a set U to store subfamilies of R® 
that have not been yet classified as satisfying or violating. It starts building the 
quotient MDP with merged actions. That is, we never construct the all-in-one 
MDP, and we merge actions as discussed in Remark 1. For every R € U, the algo- 
rithm restricts the set of realisations to obtain the corresponding subfamily. For 
the restricted quotient MDP, the algorithm runs standard MDP model checking 
to compute the max and min probability and corresponding schedulers, respec- 
tively. Then, the algorithm either classifies R as satisfying/violating, or splits it 
based on a suitable predicate, and updates U accordingly. We describe the split- 
ting strategy in the next subsection. The algorithm terminates if U is empty, 
i.e., all subfamilies have been classified. As only a finite number of subfamilies 
of realisations has to be evaluated, termination is guaranteed. 

The refinement loop for max synthesis is very similar, cf. Algorithm 2. Recall 
that now the goal is to find the realisation r* that maximises the satisfaction 
probability max* of a path formula. The difference between the algorithms lies 
in the interpretation of the results of the underlying MDP model checking. If 
the max probability for R is below max*, R can be discarded. Otherwise, we 
check whether the corresponding scheduler Gmax is consistent. If consistent, the 
algorithm updates r* and max’, and discards œR. If the scheduler is not consistent 
but min > max* holds, we can still update max* and improve the pruning 
process, as it means that some realisation (we do not know which) in R induces 
a higher probability than max*. Regardless whether max* has been updated, the 
algorithm has to split based on some predicate, and analyse its subfamilies as 
they may include the maximising realisation. 
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Algorithm 2. Max synthesis 


Input: A family D of MCs with the set R? of realisations, and a path formula ¢ 
Output: A realisation r* € R? according to Problem 2. 


1: max* — —oo, U = {Rf} 

2: M2 — buildQuotientMDP(D, R®,~;) > Applying Def. 7 and 8 
3: while U Æ Ø do 

4: select R € U and U — U \ {R} 

5: M2 [R] — restrict(M2,R) > Applying Def. 12 
6: (max, Cmax) — solveMaxMDP( M£ [R], ¢) 

T: (min, omin) — solveMinMDP(M2[R], ¢) 

8: if max > max* then 

9: if isConsistent(omax) then r* — qmax, max* — max 

10: else 

11: if min > max* then max* — min 

12: U + U Usplit(R, selPredicate(max, max, MIN, Omin)) > See Sect. 4.4 


13: return r* 


4.4 Splitting Strategies 


If verifying the quotient MDP M2[R] cannot classify the (sub-)realisation R 
as satisfying or violating, we split R, while we guide the splitting strategy by 
using the obtained verification results. The splitting operation chooses a suitable 
parameter k € K and predicate A; that partition the realisations R into Rr and 
R1 (see Definition 11). A good splitting strategy globally reduces the number of 
model-checking calls required to classify all r € R. 

The two key aspects to locally determine a good k are: (1) the vari- 
ance, that is, how the splitting may narrow the difference between max = 
Prob™*(M®[4],¢) and min = Prob™"(M2[4],¢) for both X = Rr or 
X = R1, and (2) the consistency, that is, how the splitting may reduce the 
inconsistency of the schedulers Cmax and Omin. These aspects cannot be eval- 
uated precisely without applying all the split operations and solving the new 
MDPs M2[R,] and M2[R-+]. Therefore, we propose an efficient strategy that 
selects k and A; based on a light-weighted analysis of the model-checking results 
for M2[R]. The strategy applies two scores variance(k) and consistency(k) 
that estimate the influence of k on the two key aspects. For any k, the scores are 
accumulated over all important states s (reachable via Cmax OF Omin, respectively) 
where B(s)(k) #0. A state s is important for R and some 6 € R>oọ if 


Prob™®*( MX [R], ¢)(s) — Prob™!"(M2[R], 4)(s) . 5 
Prob™a*( MA [R], $) — Prob™™(MA[R], p) | 


where Prob™™(.)(s) and Prob™®*(.)(s) is the min and max probability in the 
MDP with initial state s. To reduce the overhead of computing the scores, we 
simplify the scheduler representation. In particular, for Cmax and every k € K, 
we extract a map C*,.: Tk — N, where C*,,(t) is the number of important 
states for which Omax(s) = ar with r(k) = t. The mapping C*,. represents min- 


min 
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We define variance(k) = J hen, [CE ax(t)— OE in(t)|, leading to high scores if 
the two schedulers vary a lot. Further, we define consistency(k) = size (CK.,..)- 
max (C4...) +size (CK,,,) max (C¥in), where size (C) = |{t € Tẹ | C(t) > O}|-1 
and max (C) = maxzer, {C(t)}, leading to high scores if the parameter has clear 
favourites for Cmax and Omin, but values from its full range are chosen. 

As indicated, we consider different strategies for the two synthesis problems. 
For threshold synthesis, we favour the impact on the variance as we principally do 
not need consistent schedulers. For the max synthesis, we favour the impact on 
the consistency, as we need a consistent scheduler inducing the max probability. 

Predicate A; is based on reducing the variance: The strategy selects T’ C Tk 
with |T"| = 4 [|T|], containing those t for which CX.,..(t)—C%,, (2) is the largest. 
The goal is to get a set of realisations that induce a large probability (the ones 
including T” for parameter k) and the complement inducing a small probability. 


Approach 5 (MDP-based abstraction refinement). The methods under- 
lying Algorithms 1 and 2, together with the splitting strategies, provide solutions 
to the synthesis problems and are referred to as MDP abstraction methods. 


5 Experiments 


We implemented the proposed synthesis methods as a Python prototype using 
Storm [25]. In particular, we use the Storm Python API for model-adaption, 
-building, and -checking as well as for scheduler extraction. For SMT solving, 
we use Z3 [39] via pySMT [26]. The tool-chain takes a PRISM [38] or JANI [8] 
model with open integer constants, together with a set of expressions with possi- 
ble values for these constants. The model may include the parallel composition of 
several modules/automata. The open constants may occur in guards“, probabil- 
ity definitions, and updates of the commands/edges. Via adequate annotations, 
we identify the parameter values that yield a particular action. The annota- 
tions are key to interpret the schedulers, and to restrict the quotient without 
rebuilding. 

All experiments were executed on a Macbook MF839LL/A with 8GB RAM 
memory limit and a 12h time out. All algorithms can significantly benefit from 
coarse-grained parallelisation, which we therefore do not consider here. 


5.1 Research Questions and Benchmarks 


The goal of the experimental evaluation is to answer the research question: 
How does the proposed MDP-based abstraction methods (Approaches 3-5) cope 
with the inherent complexity (i.e. the NP-hardness) of the synthesis problems 
(cf. Problems1 and 2)? To answer this question, we compare their perfor- 
mance with Approaches 1 and 2 [18], representing state-of-the-art solutions and 
the base-line algorithms. The experiments show that the performance of the 


4 Slight care by the user is necessary to avoid deadlocks. 
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Table 1. Benchmarks and timings for Approaches 1-3 


Bench. Range |K| |D| | Member size Quotient size Run time 
Avg. |S| Avg. |T| |s| IA] |T] | 1-by-1 All-in-1 | Sched. 
Enum. 
Pole [3.35, 3.82] 17 1327104 5689 16896 | 6793 7897 22416 | 130k* MO 26k 
Maze [9.8, 9800] 20 1048576 134 211 203 277 409| 28k* TO 2.7k 
Herman [1.86, 2.44] 9 576 5287 6948 | 21313 102657 184096 55" 72 246 
DPM [68, 210] 9 32768 5572 18147 | 35154 66096 160146 | 2.9k* MO 7.2k 
BSN (0, 0.988] 10 1024 116 196| 382 457 762| 31% 2 2 


MDP abstraction significantly varies for different case studies. Thus, we consider 
benchmarks from various application domains to identify the key characteristics 
of the synthesis problems affecting the performance of our approach. 


Benchmarks description. We consider the following case studies: Maze is a plan- 
ning problem typically considered as POMDP, e.g. in [41]. The family describes 
all MCs induced by small-memory [14,35] observation-based deterministic strate- 
gies (with a fixed upper bound on the memory). We are interested in the 
expected time to the goal. In [35], parameter synthesis was used to find ran- 
domised strategies, using [22]. Pole considers balancing a pole in a noisy and 
unknown environment (motivated by [2,12]). At deploy time, the controller has 
a prior over a finite set of environment behaviours, and should optimise the 
expected behavior without depending on the actual (hidden) environment. The 
family describes schedulers that do not depend on the hidden information. We 
are interested in the expected time until failure. Herman is an asynchronous 
encoding of the distributed Herman protocol for self-stabilising rings [33,37]. 
The protocol is extended with a bit of memory for each station in the ring, 
and the choice to flip various unfair coins. Nodes in the ring are anonymous, 
they all behave equivalently (but may change their local memory based on local 
events). The family describes variations of memory-updates and coin-selection, 
but preserves anonymity. We are interested in the expected time until stabilisa- 
tion. DPM considers a partial information scheduler for a disk power manager 
motivated by [7,27]. We are interested in the expected energy consumption. 
BSN (Body sensor network, [43]) describes a network of connected sensors that 
identify health-critical situations. We are interested in the reliability. The family 
contains various configurations of the used sensors. BSN is the largest software 
product line benchmark used in [18]. We drop some implications between fea- 
tures (parameters for us) as this is not yet supported by our modelling language. 
We thereby extended the family. 

Table1 shows the relevant statistics for each benchmark: the benchmark 
name, the (approximate) range of the min and max probability/reward for the 
given family, the number of non-singleton parameters ||, and the number of 
family members ||. Then, for the family members the average number of states 
and transitions of the MCs, and the states, actions (= }> „es |Act(s)|), and transi- 
tions of the quotient MDP. Finally, it lists in seconds the run time of the base-line 
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Table 2. Results for threshold synthesis via abstraction-refinement 


Inst A Below Subf Above Subf Singles | # Iter Time Build Check Anal. Speedup 
below above 

Pole 3.37 697 176 1326407 2186 920| 4723 308 117 60 118 421 
3.73 | 1307077 7854 20027 3279 1294| 22265 1.7k 576 317 396 77 
3.76 | 1322181 3140 4923 1025 1022| 8329 584 187 114 197 222 
3.79 | 1326502 572 602 123 74| 1389 58 23 10 23 2.2k 

Maze 10 4 3 1048572 92 4 189 5 <1 3 <I 26k 
20 4247 2297 1044329 4637 3400| 13867 114 21 43 29 246 
30 18188 9934 1030388 18004 14010| 55875 608 80 127 270 46 
8000 | 1046285 846 2291 1125 969| 3941 136 9 106 13 1.0k 

Herman 1.9 6 6 570 368 320 747 333 303 11 18 0.2 
1.71 0 0 576 258 184 515 232 206 8 17 0.3 

DPM 80 160 141 32608 1292 356| 2865 1.0k 602 322 64 3 
70 6 6 32762 443 40 897 380 190 156 32 8 
60 0 0 32768 104 6 207 99 42 48 8 29 

BSN .965 544 81 480 81 25 321 2 <i <1 <i 1 
-985 994 41 30 8 5 97 <1 <i <1 <1 


algorithms and the consistent scheduler enumeration. The base-line algorithms 
employ the one-by-one and the all-in-one technique, using either a BDD or a 
sparse matrix representation. We report the best results. MOs indicate breaking 
the memory limit. Only the all-in-one approach required significant memory. As 
expected, the SMT-based implementation provides an inferior performance and 
thus we do not report its results. 


5.2 Results and Discussion 


To simplify the presentation, we focus primarily on the threshold synthesis prob- 
lem as it allows a compact presentation of the key aspects. Below, we provide 
some remarks about the performance for the max and feasibility synthesis. 


Results. Table2 shows results for threshold synthesis. The first two columns 
indicate the benchmark and the various thresholds. For each threshold A, the 
table lists the number of family members below (above) A, each with the number 
of subfamilies that together contain these instances, and the number of singleton 
subfamilies that were considered. The last table part gives the number of iter- 
ations of the loop in Algorithm 1, and timing information (total, build/restrict 
times, model checking times, scheduler analysis times). The last column gives 
the speed-up over the best base-line (based on the estimates). 


Key observations. The speed-ups drastically vary, which shows that the MDP 


abstraction often achieves a superior performance but may also lead to a perfor- 
mance degradation in some cases. We identify four key factors. 


5 Values with a * are estimated by sampling a large fraction of the family. 
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Iterations. As typical for CEGAR approaches, the key characteristic of the 
benchmark that affects the performance is the number N of iterations in the 
refinement loop. The abstract action introduces an overhead per iteration caused 
by performing two MDP verification calls and by the scheduler analysis. The 
run time for BSN, with a small |Ð| is actually significantly affected by the 
initialisation of various data structures; thus only a small speedup is achieved. 


Abstraction size. The size of the quotient, compared to the average size of 
the family members, is relevant too. The quotient includes at least all reachable 
states of all family members, and may be significantly larger if an inconsistent 
scheduler reaches states which are unreachable under any consistent scheduler. 
The existence of such states is a common artefact from encoding families in 
high-level languages. Table 1, however, indicates that we obtain a very compact 
representation for Maze and Pole. 


Thresholds. The most important aspect is the threshold A. If A is closer to the 
optima, the abstraction requires a smaller number of iterations, which directly 
improves the performance. We emphasise that in various domains, thresholds 
that ask for close-to-optimal solutions are indeed of highest relevance as they 
typically represent the system designs developers are most interested in [44]. Why 
do thresholds affect the number of iterations? Consider a family with Tk = {0,1} 
for each k. Geometrically, the set R? can be visualised as | K|-dimensional cube. 
The cube-vertices reflect family members. Assume for simplicity that one of 
these vertices is optimal with respect to the specification. Especially in bench- 
marks where parameters are equally important, the induced probability of a 
vertex roughly corresponds to the Manhattan distance to the optimal vertex. 
Thus, vertices above the threshold induce a diagonal hyperplane, which our 
splitting method approximates with orthogonal splits. Splitting diagonally is 
not possible, as it would induce optimising over observation-based schedulers. 
Consequently, we need more and more splits the more the diagonal goes through 
the middle of the cube. Even when splitting optimally, there is a combinato- 
rial blow-up in the required splits when the threshold is further from the optimal 
values. Another effect is that thresholds far from optima are more affected by 
the over-approximation of the MDP model-checking results and thus yield more 
inconclusive answers. 


Refinement strategy. So far, we reasoned about optimal splits. Due to the 
computational overhead, our strategy cannot ensure optimal splits. Instead, the 
strategy depends mostly on information encoded in the computed MDP strate- 
gies. In models where the optimal parameter value heavily depends on the state, 
the obtained schedulers are highly inconsistent and carry only limited information 
for splitting. Consequently, in such benchmarks we split sub-optimally. The sub- 
optimality has a major impact on the performance for Herman as all obtained 
strategies are highly inconsistent — they take a different coin for each node, which 
is good to speed up the stabilisation of the ring. 


Summary. MDP abstraction is not a silver bullet. It has a lot of potential in 
threshold synthesis when the threshold is close to the optima. Consequently, 
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feasibility synthesis with unsatisfiable specifications is handled perfectly well by 
MDP abstraction, while this is the worst-case for enumeration-based approaches. 
Likewise, max synthesis can be understood as threshold synthesis with a shifting 
threshold max”: If the max* is quickly set close to max, MDP abstraction yields 
superior performance. Roughly, we can quickly approximate max* when some of 
the parameter values are clearly beneficial for the specification. 


6 Conclusion and Future Work 


We contributed to the efficient analysis of families of Markov chains. In particu- 
lar, we discussed and implemented existing approaches to solve practically inter- 
esting synthesis problems, and devised a novel abstraction refinement scheme 
that mitigates the computational complexity of the synthesis problems, as shown 
by the empirical evaluation. In the future, we will include refinement strategies 
based on counterexamples as in [23,34]. 
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Abstract. Efficient optimal scheduling for concurrent systems on a 
finite horizon is a challenging task up to date: Not only does time have 
a continuous domain, but in addition there are exponentially many pos- 
sible decisions to choose from at every time point. 

In this paper we present a solution to the problem of optimal time- 
bounded reachability for Markov automata, one of the most general 
formalisms for modelling concurrent systems. Our algorithm is based 
on the discretisation of the time horizon. In contrast to most existing 
algorithms for similar problems, the discretisation step is not fixed. We 
attempt to discretise only in those time points when the optimal sched- 
uler in fact changes its decision. Our empirical evaluation demonstrates 
that the algorithm improves on existing solutions up to several orders of 
magnitude. 


1 Introduction 


Modern technologies grow and complexify rapidly, making it hard to ensure their 
dependability and reliability. Formal approaches to describing these systems 
include (generalised) stochastic Petri nets [Mol82, MCB84, MBC+98,Bal07], 
stochastic activity networks [MMS85], dynamic fault trees [BCS10] and others. 
The semantics of these modelling languages is often defined in terms of contin- 
uous time Markov chains (CTMCs). CTMCs can model the behaviour of seem- 
ingly independent processes evolving in memoryless continuous time (according 
to exponential distributions). 

Modelling a system as a CTMC, however, strips it of any notion of choice, 
e.g., which of a number of requests to process first, or how to optimally bal- 
ance the load over multiple servers of a cluster. Making sure that the system is 
safe for all possible choices of this kind is an important issue when assessing its 
reliability. Non-determinism allows the modeller to capture these choices. Mod- 
elling systems with non-determinism is possible in formalisms such as interactive 
Markov chains [Her02], or Markov automata (MA) [EHKZ13]. The latter are one 
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of the most general models for concurrent systems available and can serve as a 
semantics for generalised stochastic Petri nets and dynamic fault trees. 

A similar formalism, continuous time Markov decision processes (CTMDPs) 
[Ber00,Put94], has seen wide-spread use in control theory and operations 
research. In fact, MA and CTMDPs are closely related: They both can model 
exponential Markovian transitions and non-determinism. However, MA are com- 
positional, while CTMDPs are not: In general it is not possible to model a system 
as a CTMDP by modelling each of its sub-components as smaller CTMDP and 
then combining them. This is why modelling large systems with many com- 
municating sub-components as a CTMDP is cumbersome and error-prone. In 
fact, most modern model checkers, such as Storm [DJKV17], Modest [HH14] 
and PRISM [KNP11], do not offer any support for CTMDPs. 

In the analysis of MA and CTMDPs, one of the most challenging problems is 
the approximation of optimal time-bounded reachability probability, i.e. the max- 
imal (or minimal) probability of a system to reach a set of goal states (e. g. unsafe 
states) within a given time bound. Due to the presence of non-determinism this 
value depends on which decisions are taken at which time points. Since the opti- 
mal strategy is time dependent there are continuously many different strategies. 
Classically, one deals with continuity by discretising the values, as is the case in 
most algorithms for CTMDPs and MA [Neul0,FRSZ16,HH15,BS11]: The time 
horizon is discretised into finitely many intervals, and the value within each 
interval is approximated by e.g. polynomial or exponential functions. 

Discretisation is closely related to the scheduler ‘ 
that is optimal for a specific MA. As an example, : aaa 
consider Fig. 1: The plot shows the probabilities of 


reaching a goal state for a certain time bound, by 2 ag 
choosing options 1 and 2. If less than 0.9 seconds = 

remain, option 1 has a higher probability of reach- a 02 
ing the goal set, while option 2 is preferable as long s 


as more than 0.9 seconds are left. In this exam- , 
ple it is enough to discretise the time horizon with E te 
roughly 2 intervals: [0,0.9] and (0.9, 1.5]. The algo- ER 
rithms known to date however use from 200 to 2-106 
intervals, which is far too many. The solution that 
we present in this paper discretises the time horizon 
in only three intervals for this example. 


Fig. 1. Reachability proba- 
bility for different decisions 


Our contribution consists in an algorithm that computes time bounded reach- 
ability probabilities for Markov automata. The algorithm discretises the time 
horizon by intervals of variable length, making them smaller near those time 
points where the optimal scheduler switches from one decision to another. We 
give a characterisation of these time points, as well as tight sufficient conditions 
for no such time point to exist within an interval. We present an empirical eval- 
uation of the performance of the algorithm and compare it to other algorithms 
available for Markov automata. The algorithm does perform well in the com- 
parison, improving in some cases by several orders of magnitude, but does not 
strictly outperform available solutions. 
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2 Preliminaries 


Given a finite set S, a probability distribution over S is a function u : S —> [0,1], 
s. t. X seg H(s) = 1. We denote the set of all probability distributions over S by 
Dist(S). The sets of rational, real and natural numbers are denoted with Q, R 
and N resp., Xpo = {x € X | e0}, for X € {Q, R}, > € {>, >}, Noo = NU {0}. 


Definition 1. A Markov automaton (MA)! is a tuple M = (S, Act, P,Q, G) 
where S is a finite set of states partitioned into probabilistic (PS) and Marko- 
vian (MS), G C S is a set of goal states, Act is a finite set of actions, 
P : PS x Act > Dist($) is the probabilistic transition matriz, Q: MS x S — Q 
is the Markovian transition matriz, s.t. Q(s,s’) > 0 fors 4 s, Q(s,s) = 


= ss Q(s, s’). 


Figure2 shows an example MA. Grey and white 2 
colours denote Markovian and probabilistic states cor- © i 
respondingly. Transitions labelled as a or ( are actions 'B,1 2 
of state sı. Dashed transitions associated with an 2 Aa 05 2 
action represent the distribution assigned to the action. © , *@-@ 
Purely solid transitions are Markovian. © 


Notation and further definitions: For a Markovian state 

s E€ MS and s’ Æ s, we call Q(s, s’) the transition rate Fig. 2. An example MA. 
from s to s’. The exit rate of a Markovian state s is 

E(s) := Do 445 Q(s, 8’). Emax denotes the maximal exit 

rate among all the Markovian states of M. For a probabilistic state s € PS, 
Act(s) = {a € Act|du € Dist(S) : P(s,a) = u} denotes the set of actions 
that are enabled in s. P[s,a,-] € Dist(S) is defined by P[s, a, s’] := a(s’), where 
P(s’,a) = u. We impose the usual non-zenoness [GHH+14] restriction on MA. 
This disallows e.g., probabilistic states with no outgoing transitions, or with 
only self-loop transitions. 


A (timed) path in M is a finite or infinite sequence p = so ag eae 


Akste Anti te+1 
-EF spi, 5 +++ , where a; E Act(s;) for s; € PS, and a; = L for 


si € MS. For a finite path p = so Og SO 6.4, OS awe define pl = sx. 
The set of all finite (infinite) paths of M is denoted by Paths* (Paths). 

Time passes continuously in Markovian states. The system leaves the state 
after the amount of time that is governed by an exponential distribution, i.e. 
the probability of leaving s € MS within t > 0 time units is given by 1—e~E)*, 
after which the next state s’ is chosen with probability Q(s, s’)/E(s). 

Probabilistic transitions happen instantaneously. Whenever the system is in 
a probabilistic state s and an action a € Act(s) is chosen, the successor s’ is 


1 Strictly speaking, this is the definition of a closed Markov automaton in which no 
state has two actions with the same label. This is however not a restriction since the 
analysis of general Markov automata is always performed only after the composition 
under the urgency assumption is performed. Additional renaming of the actions does 
not affect the properties considered in this work. 
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selected according to the distribution P[s,a,-] and the system moves from s to 
s’ right away. Thus, the residence time in probabilistic states is always 0. 


2.1 Time-Bounded Reachability 


In this work we are interested in the probability to reach a certain set of states of 
a Markov automaton within a given time bound. However, due to the presence 
of multiple actions in probabilistic states the behaviour of a Markov automaton 
is not a stochastic process and thus no probability measure can be defined. This 
issue is resolved by introducing the notion of a scheduler. 

A general scheduler (or strategy) x : Paths* — Dist(Act) is a measurable 
function, s. t. Vp € Paths* if p} € PS then m(p) € Dist(Act(p|)). General sched- 
ulers provide a distribution over enabled actions of a probabilistic state given 
that the path p has been observed from the beginning of the system evolu- 
tion. We call stationary such a general scheduler m that can be represented as 
m: PS — Act, i.e. it is non-randomised and depends only on the current state. 
The set of all general (stationary) schedulers is denoted by [gen (stat resp). 

Given a general scheduler 7, the behaviour of a Markov automaton is a fully 
defined stochastic process. For the definition of the probability measure Pr, on 
Markov automata we refer to [Hat17]. 

Let s € S, T € Qyo be a time bound and 7 € [gen be a general scheduler. 
The (time-bounded) reachability probability (or value) for a scheduler m and state 
s in M is defined as follows: 


val" (T) := Prt, OSG], 


where STG = {s eee | Hi: 3; € Ci ot < T} is the set of 
paths starting from s and reaching G before T. 

For opt € {sup,inf}, the optimal (time-bounded) reachability probability (or 
value) of state s in M is defined as follows: 


val (T) := OPt re Myon val" (T) 


We denote by val"(T) (val (T)) the vector of values val!” ve Sei 
for all s € S. A general scheduler that achieves optimum for val is a 
optimal, and the one that achieves value v, s.t. |v — val (T ie < œ, is 
e-optimal. 


Optimal Schedulers. For the time-bounded reachability problem it is known 
[RS13] that there exists an optimal scheduler 7 of the form 7: PS x Ryo > Act. 
This scheduler does not need to know the full history of the system, but only the 
current probabilistic state it is in and the total time left until time bound. It is 
deterministic, i.e. not randomised, and additionally, this scheduler is piecewise 
constant, meaning that there exists a finite partition Z(z) of the time interval 
(0, T] into intervals Io = [to, ta], = (ti, te], s% ,Ik-1 = (tk-1, te], such that 
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to = 0,t, = T and the value of the scheduler remains constant throughout each 
interval of the partition, i.e. VI € Z(m),Vt1, te € I,Vs € PS : r(s,t1) = 1(s,t2). 
The value of m on an interval I € Z(m) and s € PS is denoted by z(s, J), i.e. 
n(s, I) = 7(s,t) for any t € I. 

As an example, consider the MA in Fig.2 and time bound T = 1. Here the 
optimal scheduler for state sı chooses the reliable but slow action ( if there is 
enough time, i.e. if at least 0.62 time is left. Otherwise the optimal scheduler 
switches to a more risky, but faster path via action a. 

In the literature this subclass of schedulers is sometimes referred to as total- 
time positional deterministic, piecewise constant schedulers. From now on we call 
a scheduler from this subclass simply a scheduler (or strategy) and denote the 
set of such schedulers with JI. An important notion of schedulers is the switching 
point, the point of time separating two intervals of constant decisions: 


Definition 2. For a scheduler n and s € PS we call rT E€ Ryo a switching 
point, iff 3h, I € T(r), s.t. T = sup, and T = inf I> and As € PS : r(s, I) 4 
t(s, I2). 


Whether the switching points can be computed exactly or not is an open 
problem. In fact, the theorem of Lindemann-Weierstrass suggests that switching 
points are non-algebraic numbers, what hints at a negative answer. 


3 Related Work 


In this section we briefly review the algorithms designed to approximate time 
bounded reachability probabilities. We only discuss the algorithms that guaran- 
tee to compute ¢-close approximation of the reachability value. 

The majority of the algorithms [Neu10,BS11,FRSZ16,SSM18, BHHK15] are 
available for continuous time Markov decision processes (CTMDPs) [Ber00]. Two 
of those, [Neu10] and [BHHK15], are also applicable to MA. We compare to them 
in our empirical evaluation in Sect. 5. All the algorithms utilise such known tech- 
niques as discretisation, uniformisation, or a combination thereof. The drawback 
of most of the algorithms is that they do not adapt to a specific instance of a 
problem. Namely, given a model M to analyse, they perform as many computa- 
tions as is needed for M, which is the worst-case model in a subclass of models 
that share certain parameters with M, such as Emax, for example. Experimental 
evaluation performed in [BHHK15] shows that such approaches are not promis- 
ing, because most of the time the algorithms perform too many unnecessary 
computations. This is not the case for [BS11] and [BHHK15]. The latter per- 
forms the analysis via uniformisation and schedulers that cannot observe time. 
The former, designed for CTMDPs, performs discretisation of the time horizon 
with intervals of variable length, however is not applicable to MA. Just like 
in [BS11], our approach is to adapt the discretisation of the time horizon to a 
specific instance of the problem. 
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4 Our Solution 


In this section we present a novel approach to approximating optimal time- 
bounded reachability and the optimal scheduler for an arbitrary Markov automa- 
ton. Throughout the section we work with an MA M = (S, Act, P,Q, G), time 
bound T € Q>o and precision € E€ Qso. To simplify the presentation we concen- 
trate on supremum reachability probability. 

Given a scheduler, computation (or approximation) of the reachability prob- 
ability is relatively easy: 
Lemma 1. For a scheduler 7 € IT and a state s E€ S, the function val : 
(0, T] — [0,1] is the solution to the following system of equations: 


fs(t) =l if s EG 
dfs(t) / 
-—— = Q(s, 8°) - fsr(t) else if s € MS 
r (1 
fs(t) = 5 Pls, n(s,t),s'] - fs(t) else if s € PS 
SES 
1 ifsEG 
f.(0) = 2 Meman - fa (0) else if s € PS (2) 
0 otherwise 
Let 0 = To < Tn <... < TrR-1 < Tk = T, where 7; are the switching 


points of m for i = 1.4 — 1. The solution of the system of Equations (1)-(2) 
can be obtained separately on each of the intervals (7;_1,7;],Vi = 1..k, where 
the value of the scheduler remains constant for all states. Given the solution 
val" (t) on interval (T;—1,T;], we derive the solution for (7;, 7:41] by using the 
values val:"(7;) as boundary conditions. Later in Sect. 4.1 we will show that 
the approximation of the solution for each interval (7;-1,7;] can be achieved via 
a combination of known techniques, such as uniformisation (for the Markovian 
states) and untimed reachability analysis (for probabilistic states). 

Thus, given an optimal scheduler, Lemma 1 can be used to compute or 
approximate the optimal reachability value. Finding an optimal scheduler is 
therefore the challenge for optimal time-bounded reachability analysis. Our solu- 
tion is based on approximating the optimal reachability value up to an arbitrary 
e > 0 by discretising the time horizon with intervals of variable length. On each 
interval the value of our ¢-optimal scheduler remains constant. The discretisation 
we use attempts to reflect the partition Z(7) of a minimal? optimal scheduler 7, 
i.e. it mimics intervals on which 7 has constant value. 

Our solution is presented in Algorithm 1. It computes an c-optimal scheduler 
Topt and approximates the system of Equations (1)—(2) for Topi. The algorithm 
iterates over intervals of constant decisions of an ¢-optimal strategy. At each 


? In the size of T(r). 
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iteration it computes: (i) a stationary scheduler 7 that is close to be optimal on 
the current interval (line 7), (ii) length 6 of the interval, on which m introduces 
acceptable error (line 8) and (iii) the reachability values for time t + ô (line 9). 
The following sections discuss the steps of the algorithm in more detail. 


Theorem 1. Algorithm 1 approximates the value of an arbitrary Markov 
automaton for time bound T € Qso up to a given £ € Qso. 


Algorithm 1. SwitchStep 


Input: MA M = (S, Act, P, Q, G), time bound T € Qs0, precision € E€ Qso 
Output: u(T) € (0, 1]!5!, s.t. |ju(T) — val (T)||.0 < £, e-optimal scheduler Topt 
Parameters: w € (0,1), and £; < £, by default w = 0.1, €; = w - € 


1: min = (1 — w) - 2- (e — ci)/Emax?/T 

2: Ey = Er = webmin/T 

3: t=0, Eic = Ei 

4: Vs € MS: us(t) = (s € G)?1 : 0 and Vs € PS : us(t) = Rž, (s, G) 
5: Vs € PS : Tope (s, 0) = arg max RZ, (s, G) 

6: while t < T do 

7: m = FINDSTRATEGY(u(t)) 

8: 6,€5 = FINDSTEP(M, T — t, dmin, U(t), Ew, Er, T) 

9: compute u(t + 6) according to (5) for ew and €r 


10: t=t+ ô, Ekee — Hae + Es 
11: Vs € PS,7 € (0,5) : Tops, t + 7) = 1 (s) 


12: return us(T), Topt 


4.1 Computing the Reachability Value 


In this section we discuss steps 4 and 9, that require computation of the reacha- 
bility probability according to the system of Equations (1)—(2). Our approach is 
based on the approximation of the solution. The presence of two types of states, 
probabilistic and Markovian, demands separate treatment of those. Informally, 
we will combine two techniques: time-bounded reachability analysis on continu- 
ous time Markov chains? for Markovian states and time-unbounded reachability 
analysis on discrete time Markov chains* for probabilistic states. Parameters 
w and £; of Algorithm 1 control the error allowed by the approximation. Here 
€i bounds the error for the very first instance of time-unbounded reachability 
in line 4. While w defines the fraction of the error that can be used by the 
approximations in subsequent iterations (ew and €r). 

We start with time-unbounded reachability analysis for probabilistic states. 
Let m € Lstat, 8,8’ € S. We define 


3 Markov automata without probabilistic states. 
4 Markov automata without Markovian states and such that Vs € PS : |Act(s)| = 1. 
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1 if s= s 

R(s, T, s") = 2 Poeehp] -Ri(p,7,s8') else if s € PS (3) 
p 
0 otherwise 


This value denotes the probability to reach state s’ starting from state s by 
performing any number of probabilistic transitions and no Markovian transi- 
tions. This system of linear equations can be either solved exactly, e.g. via 
Gaussian elimination, or approximated (numerical methods). If R(s,7, s’) is 
under-approximated we denote it by Re(s, 7, s’), where e is the approxima- 
tion error. For A C S we define R(s,7,A) = X yea R(8, T, 3°), Re(s, T, A) = 
Xea Reels, T, s’). 

For time bound 0,s € PS the value val! (0) is the optimal probability 
to reach any goal state via only probabilistic transitions. We denote it by 
R*(s,G) = maxzer,,, R(s,7,G) (step 4). It is a well-known problem on dis- 
crete time Markov decision processes [Put94] and can be computed or approxi- 
mated by policy iteration, linear programming [Put94] or interval value iteration 
[(HM14,QK18,BKL+17]. If the value is approximated up to €, we denote it by 
Rx(s,G). 

The reachability analysis on Markovian states is solved with the well-known 
uniformisation approach [Jen53). Informally, Markovian states will be implicitly 
uniformised: The exit rate for each Markovian state will be equal Emax (by 
adding a self-loop transition), but this will not affect the reachability value. 

We will first define the discrete probability to reach the target vector within 
k Markovian transitions. Let æ € [0,1]!5! be a vector of values for each state. 
For k € Nyo,7 € stat we define DE (s,m) = 1 if s € G and otherwise: 


Ls ifk =0 
Q(s,s’) , k—1/ ot _ E(s)\, k—1 : 
DE (s, Ae >, E De (8,7) + (1-a) De (3,7) ifk>0,se MS 
E R(s,7,8')- DE(s', T) if k >0,s € PS 
s! EMSUG 


(4) 
The value D£ (s, 7) is the weighted sum over all states s’ of the value zy and the 
probability to reach s’ starting from s within k Markovian transitions. There- 
fore the counter k decreases only when a Markovian state performs a transition 
and is not affected by probabilistic transitions. If values R(s, 7, s’) are approx- 
imated up to precision €, i.e. Re(s, m, s’) is used for probabilistic states instead 
of R(s,7, 8’) in (4), we use the notation Dé .(s, 7). 

We denote with Y, the probability mass function of the Poisson distribution 
with parameter À. For a r € Ryo and ew € (0,1], N(7,ew) is some natural 
number satisfying oe VEn rli) 2 1— eg, e.g. N(7, ev) = [Emax T: e? — 
In(ew)| [BHHK15], where e is the Euler’s number. 

We are now in position to describe a way to compute u(t + 6) at line 9 of 
Algorithm 1. Let u(t) € [0,1]!5! be a vector of values computed by the previous 
iteration of Algorithm 1 for time t. Let val" (t + 6) be the solution of the 
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system of Equation (1) for time point t+ ô, a stationary scheduler m : PS — Act 
and where u(t) is used instead of val™” (t) as the boundary condition. The 
following Lemma shows that val: (t+ 6) can be efficiently approximated up to 
ew + Er: 


Lemma 2. Let ey € (0,1],er € [0,1], EN = €,/N((T — t) ew) and ô € [0, T — t]. 
Then Vs € S : us(t + 8) < val (t + 8) < us(t +8) +Ev + Er, where: 


1 ifsEeG 
N(ô,£v) . 
us(t+ ô) = 3 VE nad(4) Duaje T) else if s € MS (5) 
D Rerls, T, 5’): us(t+8) else ifs € PS 
s’ MSUG 


4.2 Choosing a Strategy 


The strategy for the next interval is computed in Step 7 and implicitly in Step 
4. The latter has been discussed in Sect. 4.1. We proceed to Step 7. 

Here we search for a strategy that remains constant for all time points within 
interval (t,t + 6], for some ô > 0, and introduces only an acceptable error. 
Analogously to results for continuous time Markov decision processes [Mil68], 
we prove that derivatives of function u(r) at time 7 = t help finding the strategy 
m that remains optimal for interval (t,t + 6], for some 6 > 0. This is rooted in 
the Taylor expansion of function u(t + ô) via the values of u(t). We define sets 

Fo = {r © Hgtat | Vs € PS : 7 = arg maxqer d (s)} 


stat — 77! 


Fi = {r € Fi-1|Vs E€ PS : Tt = arg maxyer,_,(—1)* 1d (s)}, i >1, 


where for m € stat, § € G: d(s) = 1, for s E€ MS\G: d (s) = u,(t), for 
s€ PS\G: d®) (s) =) wemsua R(S) 7,8") -us (t) and for i > 1: 


0 ifsEG 
j A. qdGi-l)(¢/ : . y 
d(s) = Pg s')-d (s’) if s € MS\G qd = a) for any 7 € F;, 
E R(s,7,8')-dM(s') ifs € PS\G 
s'EMS 


The value dË (s) is the it! derivative of us(t) at time t for a scheduler 7. 


Lemma 3. If r € Figj41 then Iô > 0 such that n is optimal on (t,t + ô]. 


Thus in order to compute a stationary strategy that is optimal on time- 
interval (t,t+0], for some ô > 0, one needs to compute at most |S|+1 derivatives 


° val" (t +6) may differ from val™” (t + 6) since its boundary condition u(t) is an 
approximation of the boundary condition val” (t), used by val” (t + ô). 
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of u(r) at time t. Procedure FINDSTRATEGY does exactly that. It computes sets 
F; until for some j € 0..(|S| + 1) there is only 1 strategy left, i.e. |F;| = 1. 
Otherwise it outputs any strategy in F\g),1. Similarly to Sect. 4.1, the scheduler 
that maximises the values R(s,7,s’) can be approximated. This question and 
other optimisations are discussed in detail in Sect. 4.4. 


4.3 Finding Switching Points 


Given that a strategy 7 is computed by FINDSTRATEGY, we need to know for 
how long this strategy can be followed before the action has to change for at least 
one of the states. We consider the behaviour of the system in the time interval 
[t, T]. Recall the function val” (t + ô), ô € [0, T — t], defined in Sect. 4.1 (Lemma 
2) as the solution of the system of Equation (1) with the boundary condition 
u(t), for a stationary scheduler 7. For a probabilistic state s the following holds: 


val7(t+6)= XO R(s,7,s')- valg (t+ ô) (6) 
s'EMSUG 
Let s € PS, r € Hestar, œ € Act(s). Consider the following function: 


valre(t+d)= Y YO Pls,a,s"]-R(s”, m, 8!) valg (t+ 6) 
s'EMSUG s”ES 


R. als m.s) 


This function denotes the reachability value for time bound t+ 6 and a 
scheduler that is different from 7. Namely, this is such a scheduler, that all 
states follow strategy 7, except for state s, that selects action a for the very first 
transition, and afterwards selects action 7(s). Between two switching points the 
strategy 7 is optimal and therefore the value of val™"°~°(t+6) is not greater than 
val? (t+6) for all s € PS,a € Act(s). If for some 6 € [0,T—t],s € PS,a € Act(s) 
it holds that valt e(t + 5) > val™(t + 6), then action a is better for s then 
m(s), and therefore 7(s) is not optimal for s at t + ô. We show that the next 
switching point after time point t is such a value t + 6,6 € (0,T — t], that 


Vs € PS,Va € Act(s), Vr € [0, ô) : val? (t +7) 2 val™ S(t +7) 
and Js € PS, a € Act(s) : val? (t +ô) < valos (¢ + ô) 


(7) 


Procedure FINDSTEP approximates switching points iteratively. It splits the 
time interval [0,7] into subintervals [t1, t2], ..., [tn-1;,tn] and at each iteration 
k checks whether (7) holds for some 6 € [tx,tx41]. The latter is performed 
by procedure CHECKINTERVAL. If Vô € [tx, th-41] (7) does not hold, FINDSTEP 
repeats by increasing k. Otherwise, it outputs the largest ô € [tx, tk+1] for which 
(7) does not hold (line 11). This is done by binary search up to distance dyin. 
Later in this section we will show that establishing that (7) does not hold for all 
Ô € [tk,tk+1] can be efficiently performed by considering only 2 time points of 
the interval [t,,tx41] and a subset of state-action pairs. 
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Algorithm 2. FINDSTEP 

Input: MA M = (S, Act, P,Q,G), time left t E€ Qso, minimal step size min, 
vector u € [0,1]!5!, ew € (0, 1], er € [0, 1], m € Hater 

Output: step ô € [dmin,t] and upper bound on accumulated error £5 > 0 


: if (toswitch = true) then 

find the largest 6 € [tk, tk+1], S. t. CHECKINTERVAL(M, tx, 6], A, ew, Er) =false 
if (6 > min) then € = 0 else € = (EmaxOmin)* /2 

return ð, € 

: else return t,0 


1: if (t < min) then return t, (Emax - t)? /2 
2: k= 1, tı = Omin 
3: do 
4: tk+41 = min{t, Te (k + 1, €v), (lte > Emax] + 1)/Emax} 
5: set A = Tmax(k +1) or A = PS x Act p see discussion in the end of Sect. 4.3 
6: toswitch = CHECKĪNTERVAL(M, |tk, tk+1], A; Ew, Er) 
T: k=k+1 
8: while (not toswitch) and tẹ < t) 
9: k=k-1 

10 

11: 

12: 

13: 

14 


Selectingt,. This step is a heuristic. The correctness of our algorithm does not 
depend on the choices of t, but its runtime is supposed to benefit from it: 
Obviously, the runtime of FINDSTRATEGY is best given an oracle that produces 
time points tg which are exactly the switching points of the optimal strategy. 
Any other heuristic is just a guess. 

At every iteration k we choose such a time point tą that the MA is very 

likely to perform at most k Markovian transitions within time t. “Very likely” 
here means with probability 1 — ew. For k € N we define Ty(k,ew) as follows: 
Tw(1,ev) = Omin, and for k > 1: Ty(k,ew) satisfies pa WE nax Tu (kev) (i) 2 
1— Ey. 
Searching for switching points within |t, tk+1]. In order to check whether val” (t+ 
ô) > val™ 3% (t + ô) for all 6 € [tk,tk+1] we only have to check whether the 
maximum of function diff(s, œ, t +ô) = val™ S(t +6)— val? (t+ ô) is at most 0 
on this interval for all s € PS,a € Act(s). In order to achieve this we work on 
the approximation of diff(s, œ, t + 6) derived from Lemma 2, thus establishing a 
sufficient condition for the scheduler to remain optimal: 


val Tttt = YO Rssals,T, 8’) - valy (t+ ô) 
s'EMSUG 
k 


< 5 Rs—>a,en (s, T, s’) 5 Emax d (i) id Di, (t),EN (5, T) (8) 


s'EMS\G i=0 


+ Rssa,en (S,7,G) + eu + Er 
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Here Rso,en (8,7, 8) (Rssa,ex(S,7,G)) denotes an under-approximation 
of the value Rs.0(s,7, 8’) (Rssa(s,7,G) resp.) up to en, defined in Lemma 2. 
And analogously for val” (t + 6). Simple rewriting leads to the following: 


k 
vat’ (t + 6) — val (t + 6) < XO Venaco li): Bh ey (8,0) + Cren(8,@), (9) 
1=0 


where Ben (5:0) = Lisems\G (Reswoy lame \— Reg (2, 0,8) ‘Di (ten (89 7) 
and Cr ey ($,@) = Rs>a en (S$, T, G) — Rey (8,7, G) + ew +r. In order to find the 
supremum of the right-hand side of (9) over all 6 € [a,b] we search for extremum 
of each y;(6) = Yena (t+) (4) Bh... (s,@), i = 0..k, separately as a function of ô. 
Simple derivative analysis shows that the extremum of these functions is achieved 
at ô = i/Emax. Truncation of the time interval by (|t - Emax] + 1)/Emax (step 
4, Algorithm 2) ensures that for all i = 0..k the extremum of y;(ô) is attained 
at either 6 = tk or 6 = tk+1- 


Lemma 4. Let |tk,tk+1] be the interval considered by CHECKINTERVAL at iter- 
ation k. VO € |tk, tk+1], s E PS,a € Act: 


k 
diff(s,a,t +8) < XC Yenadls,a,i) (i) Bi ey (8,0) + Cr ex(s1a), (10) 
i=0 


where 
tk if BBE (5, a) 2 0 and i/Emar S tk 
(s, a, i) = or B$ Ae a) <0 and i/Emar > tk 


th41 otherwise 


CHECKINTERVAL returns false iff for all s € PS,a € Act the right-hand side 
of (10) is less or equal to 0. Since Lemma 4 over-approximates diff(s,a,t+6) false 
positives are inevitable. Namely, it is possible that procedure CHECKINTERVAL 
suggests that there exists a switching point within [t;,,t,11], while in reality 
there is none. This however does not affect correctness of the algorithm and only 
its running time. 


Finding Maximal Transitions. Here we show that there exists a subset of states, 
such that, if the optimal strategy for these states does not change on an interval, 
then the optimal strategy for all states does not change on this interval. 

In the following we call a pair (s,a) € PS x Act a transition. For transitions 
(s, a), (s’, a’) E€ PS x Act we write (s,a) <x (s’, a’) iff Chex (8, a) < Criex(s’, 0’) 
and Vi = 0..k : Bi ..(s,a) < Bi... (s',a’). We say that a transition (s,a) is 
maximal if there exists no other transition (s’,a’) that satisfies the following: 
(s,@) <k (s’, a’) and at least one of the following conditions hold: C7,<y(s,a@) < 
Crex (s’,0”) or Ji = 0..k : Bh. (8,0) < Bt. (s',0’). The set of all maximal 
transitions is denoted with Zmax(k). 
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We prove that if inequality (10) holds for all transitions from Tmax(k), then it 
holds for all transitions. Thus only transitions from Tmax(k) have to be checked 
by procedure CHECKINTERVAL. In our implementation we only compute Tmnax(k) 
before the call to CHECKINTERVAL at line 11 of Algorithm 2, and use the set 
A= PS x Act within the while-loop. 


4.4 Optimisation for Large Models 


Here we discuss a number of implementation improvements developers should 
consider when applying our algorithm to large case studies: 


Switching points. It may happen that the optimal strategy switches very often 
on a time interval, while the effect of these frequent switches is negligible. The 
difference may be so small that the ¢-optimal strategy actually stays stationary 
on this interval. In addition, floating-point computations may lead to imprecise 
results: Values that are 0 in theory might be represented by non-zero float-point 
numbers, making it seem as if the optimal strategy changed its decision, when 
in fact it did not. To counteract these issues we can modify CHECKINTERVAL 
such that it outputs false even if the right-hand side of (10) is positive, as long 
as it is sufficiently small. The following lemma proves that the error introduced 
by not switching the decision is acceptable: 


Lemma 5. Let ô = tk}1 — te, € = € —&j,€ E (0,c'- 46/T) and N(6,€) = 
(Emaxd)?/2.0/e. If Vs € PS,a € Act,T € |tk,tk+1] the right-hand side of (10) is 
not greater than (e'd/T — €)/N(0,€), then m is e'6/T-optimal in |tk, tk+1]. 


Optimal strategy. In some cases computation of the optimal strategy in the way 
it was described in Sect. 4.2 is computationally expensive, or is not possible at 
all. For example, if some values la (s)| are larger than the maximal floating 
point number that a computer can store, or if the computation of |S| +1 deriva- 
tives is already too prohibitive for models of large state space, or if the values 
R(s, 7, 8’) can only be approximated and not computed precisely. With the help 
of Lemma 5 and minor modifications to Algorithm 1, the correctness and con- 
vergence of Algorithm 1 can be preserved even when the strategy computed by 
FINDSTRATEGY is not guaranteed to be optimal. 


5 Empirical Evaluation 


We implemented our algorithm as a part of IMCA [GHKN12]. Experiments were 
conducted as single-thread processes on an Intel Core i7-4790 with 32GB of 
RAM. We compare the algorithm presented in this paper with [Neul0] and 
[BHHK15]. Both are available in IMCA. We use the following abbreviations 
to refer to the algorithms: FixStep for [Neul0], Unif* for [BHHK15] and 
SwitchStep for Algorithm 1. The value of the parameter w in Algorithm 1 
is set to 0.1, e; = 0. We keep the default values of all other algorithms. 
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Table 1. The discretisation step used in some of the experiments shown in Fig. 3. 


OF min ds avg ôs | max ds | T | precision 
dpm-5-2 | 3.7- 1076 |3.65-10-°) 0.27 | 3.97 |15| 0.001 
qs-2-3 | 1.04- 107° | 1.04 - 107° | 0.017 | 7.56 |15| 0.001 
ps-2-6 |3.54-107-°| 0.0003 6 17.4 |18| 0.001 


The evaluation is performed on a set of published benchmarks: 

dpm-j-k: A model of a dynamic power management system [QWP99], repre- 
senting the internals of a Fujitsu disk drive. The model contains a queue, service 
requester, service provider and a power manager. The requester generates tasks 
of j types differing in energy requirements, that are stored in the queue of size 
k. The power manager selects the processing mode for the service provider. A 
state is a goal state if the queue of at least one task type is full. 

qs-j-k and ps-j-k: Models of a queuing system [HH12] and a polling system 
[GHH+13] where incoming requests of j types are buffered in two queues of size 
k each, until they are processed by the server. We consider the state with both 
queues being full to form the goal state set. 

The memory required by all three algorithms is polynomial in the size of the 
model. For the evaluation we therefore concentrate on runtime only. We set the 
time limit for the experiments to 15 minutes. Timeouts are marked by x in the 
plots. Runtimes are given in seconds. All the plots use the log-log axis. 


Results 


SwitchStep vs FixStep. Figure3 compares 10° Fo 
runtimes of SwitchStep and FixStep. For 
these experiments precision is set to 1073 
and the state space size ranges from 10? 

to 10°. 

This plot represents the general trend 
observed in many experiments: The algo- 
rithm FixStep does not scale well with the 10°F 
size of the problem (state space, precision, i l l Eo 
time bound). For larger benchmarks it usu-  ” 10 102 10° 101 10? 10° 
ally required more than 15 minutes. This is SwitchStep 
likely due to the fact that the discretisation 
step used by FixStep is very small, which Fig. 3. Running time comparison of 
means that the algorithm performs many FixStep and SwitchStep. 
iterations. In fact Table 1 reports on the size 
of the discretisation steps for both FixStep and SwitchStep on a few bench- 
marks. Here the column dy shows the length of the discretisation step of FixStep. 
As we mentioned in Sect.3, this step is fixed for the selected values of time 
bound and precision. Columns min és, avgds and max dg show minimal, average 


10? Ẹ 


10! |s 


FixStep 


10°F 
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and maximal steps used by SwitchStep respectively. The average step used by 
SwitchStep is several orders of magnitude larger than that of FixStep. There- 
fore SwitchStep performs much less iterations. Even though each iteration takes 
longer, overall significant decrease in the amount of iterations leads to much 
smaller total runtime. 

SwitchStep vs Unif?. 
In order to compare 
SwitchStep with Unift 


Table 2. Parameters of the experiments shown in Fig. 4. 


h i een |S| | Act| Emax |T 
n a z id ae dpm-(4..7]-2 |2061 - 158,208 4-7] 4.6-9.1 | 15 
SENES DO Ai BURE ASA O dpm-3-[2..20]| 412 - 115,108 | 3 3.3 |100 


Markov automata in which 


iin qs-1-[2..7] 124 - 3,614 |4- 14|11.3 - 35.3) 6 
robabilistic and Marko- 
p! a qs-[1..4]-2 124 - 16,924 4-8 11.3 6 
vian states alternate, and 
ar ps-[1..8]-2 47 - 156,315 | 3- 8 |3.6 - 257.6} 18 
probabilistic states have 
ps-2-[1-7] 65 - 743,969 2-4] 48-56 | 18 


only 1 successor for each 
action. This is due to the 
fact that Unif* is available in IMCA only for this subclass of models. 


OF X X KX DORK- 0° F 4 
L “ " a E 
0? = 8 a El 0? = 
0! J oE 
+ + 
4 4 
5 10° 5 10° = 
107" F = 107} F i E 
B e dpm nT @ dpm 
=| à qs -2 | a qs 
19 TA Luu! 1 uul uuuh ETITI m ps M Lasand 1 uul sul ETITI mps 
107? 107} 10° 10t 10? 10 107? 107} 10° 10t 10? 10 
SwitchStep SwitchStep 


Fig. 4. Running times of algorithms SwitchStep and Unif*. 


Figure 4 shows the comparison of running times of SwitchStep and Unif?. 
For the plot on the left we varied those model parameters that affect state space 
size, number of non-deterministic actions and maximal exit rate. In the plot on 
the right the model parameters are fixed, but precision and time bounds used 
for the experiments are differing. Table2 shows the parameters of the models 
used in these experiments. We observe that there are cases in which SwitchStep 
performs remarkably better than Unift, and cases of the opposite. Consider the 
experiments in Fig. 4, right. They show that Unif* may be highly sensitive to 
variations of time bounds and precision, while SwitchStep is more robust in this 
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respect. This is due to the fact that the scheduler computed by Unif* does not 
have means to observe time precisely and can only guess it. This may be good 
enough, which is the case on the ps benchmark. However if it is not, then better 
precision will require many more computations. Additionally Unift does not use 
discretisation. This means that the increase of the time bound from T; to Tə may 
significantly increase the overall running time, even if no new switching points 
appear on the interval [T,, T2]. SwitchStep does not suffer from these issues due 
to the fact that it considers schedulers that observe the time precisely and uses 
the discretisation. Large time intervals that introduce no switching points will 
likely be handled within one iteration. 

In general, SwitchStep performs at its best when there are not too many 
switching points, which is what is observed in most published case studies. 


Conclusions: We conclude that SwitchStep does not replace all existing algo- 
rithms for time bounded reachability. However it does improve the state of the 
art in many cases and thus occupies its own niche among available solutions. 
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Abstract. Parametric timed automata (PTA) extend timed automata 
by allowing parameters in clock constraints. Such a formalism is for 
instance useful when reasoning about unknown delays in a timed sys- 
tem. Using existing techniques, a user can synthesize the parameter con- 
straints that allow the system to reach a specified goal location, regard- 
less of how much time has passed for the internal clocks. 

We focus on synthesizing parameters such that not only the goal loca- 
tion is reached, but we also address the following questions: what is the 
minimal time to reach the goal location? and for which parameter val- 
ues can we achieve this? We analyse the problem and present a semi- 
algorithm to solve it. We also discuss and provide solutions for minimiz- 
ing a specific parameter value to still reach the goal. 

We empirically study the performance of these algorithms on a bench- 
mark set for PTAs and show that minimal-time reachability synthesis 
is more efficient to compute than the standard synthesis algorithm for 
reachability. Data or code related to this paper is available at: [26]. 


1 Introduction 


Timed Automata (TA) [2] extend finite automata with clocks, for instance to 
model real-time systems. Timed automata allow for reasoning about temporal 
properties of the designed system. In addition to reachability problems, it is 
possible to compute for TAs the minimal or maximal time required to reach a 
specific goal location. Such a result is valuable in practice, as it can describe the 
response time of a system or it may indicate when a component failure occurs. 
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Fig. 1. Train delay scheduling problem: Alice (depicted in dotted red), located at A, 
wants to go to station D. Bob (depicted in dashed blue), located at B, wants to go to A. 
By setting the train delays D; and Dg for train 1 and 2, make sure that both Alice 
and Bob reach their target station in minimum total time. (Color figure online) 


It may not always be possible to describe a real-time system with a TA. 
There are often uncertainties in the timing constraints, for instance how long 
it takes between sending and receiving a message. Optimising specific timing 
delays to improve the overall throughput of the system may also be considered, 
as shown in Example 1. Such uncertainties can however be modelled using a 
parametric timed automaton (PTA) [3]. A PTA adds parameters, or unknown 
constants, to the TA formalism. By examining the reachability of a goal location, 
the parameters get constrained and we can observe which parameter valuations 
preserve the reachability of the goal location. 

This process, also called parameter synthesis, is definitely useful for analysing 
reachability properties of a system. However, this technique does disregard tim- 
ing aspects to some extent. Given the parameter constraints, it is no longer pos- 
sible to give clear boundaries on the time to reach the goal, as this may depend 
on the parameter valuations. We focus on the parameter synthesis problem while 
reaching the goal location in minimal time, as demonstrated in Example 1. 


Example 1. Consider the example in Fig. 1, which depicts a train network con- 
sisting of two trains. Both trains share locations B and D (the station platforms) 
while locations A’,B’,C’,D’,B’”’, and D” represent a train travelling (tracks). The 
travel time for train 1 between any two stations is 100, and 55 for train 2. Train 1 
stops at stations A, B, C, and D, for time D; (and train 2 stops for Dg time units 
at B and D). Here, the train delays D; and Dg are parameters and x, and x2 are 
clocks. Both clocks start at 0 and reset after every transition. We assume that 
the trains use different tracks and changing trains at the platform of a station 
can be done in negligible time. 

Alice is starting her journey from A and would like to go to D. Bob is located 
at B and wants to go to A. Train 1 and/or 2 can be used to travel, if both the 
train and the person are at the same location. Initially, both Alice and Bob wait 
for a train, since the initial positions of train 1 and 2 are respectively C’ and D”. 
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We would like to set the train delays D; and Dg in such a way that the total 
time for Alice and Bob to reach their target location, i.e. the PTA location for 
which Alice is at station D and Bob is at station A, is minimal. The optimal 
solution is D; = 25 A Də = 15, which leads to a total time of 405 units!. Note 
that this is neither optimal for Alice (the fastest would be D; = 0A Də = 5), 
nor optimal for Bob (D; = 10 A Dg = 0). 


Note that in other instances, the time to reach a goal location may be an 
interval, describing the lower- and upper-bound on the time. This can be achieved 
in the example by changing the travel time from train 1 to be between 95 and 
105, by guarding the outgoing transitions from locations A’, B’, C’ and D’ with 
95 < x, < 105 (instead of xı = 100). We focus on the lower-bound global time, 
meaning that we look at the minimal total time passed in the system, which 
may differ from the clock values as the clocks can be reset. 

In this paper, we address the following problems: 


— minimal-time reachability: synthesizing a single parameter valuation for 
which the goal location can be reached in minimal (lower-bound) time, 

— minimal-time reachability synthesis: synthesizing all parameter valuations 
such that the time to reach the goal location is minimized, and 

— parameter minimization synthesis: synthesizing all parameter valuations such 
that a particular parameter is minimized and the goal location can still be 
reached (this problem can also address the minimal-time reachability synthesis 
problem by adding a parameter to equal with the final clock value). 


For all stated problems we provide algorithms to solve them and empirically 
compare them with a set of benchmark experiments for PTAs, obtained from [5]. 
Interestingly, compared to standard reachability and synthesis, minimal-time 
reachability and synthesis is in general computed faster as fewer states have 
to be considered in the exploration. We also look at the computability and 
intractability of the problems for PTAs and L/U-PTAs (PTAs for which each 
parameter only appears as a lower- or upper-bound). 


Related work. The earliest work on minimal-time reachability for timed 
automata was by Courcoubetis and Yannakis [17], who first addressed the prob- 
lem of computing lower and upper bounds. Several algorithms have been devel- 
oped since to improve performance [22,24,25], by e. g. using parallelism. Related 
problems have been studied, such as minimal-time reachability for weighted 
timed automata [4], minimal-cost reachability in priced timed automata [12], 
and job scheduling for timed automata [1]. 

Concerning parametric timed automata, to the best of our knowledge, the 
minimal-time reachability problem was not tackled in the past. The reachability- 
emptiness problem (“the emptiness of the parameter valuation set for which a 


1 Alice waits for train 1 to reach A at time 225, then she hops on and exits the train 
on time 350 at B. There she can immediately take train 2 and reach D at time 405. 
Bob waits for train 2 to reach B at time 55 and takes this train. At time 125 he 
reaches D and can immediately hop on train 1. Bob reaches A at time 225. 
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given set of locations is reachable” ) is undecidable [3], with various settings con- 
sidered, notably a single clock compared to parameters [21] or a single rational- 
valued or integer-valued parameter [14,21] (see [6] for a survey). Only severely 
limiting the number of clocks (e. g. [3,11,14,16]), and often restricting to integer- 
valued parameters, can bring some decidability. Emptiness for the subclass of 
L/U-PTAs is also decidable [13]. Minimizing a parameter can however be con- 
sidered done in the setting of upper-bound PTAs (PTAs in which the clocks are 
only restricted from above): the exact synthesis of integer valuations for which 
a location is reachable can be done [15], and therefore the minimum valuation 
of a parameter can be obtained. 


2 Preliminaries 


We assume a set X = {21,...,%)x)} of clocks, i.e. real-valued variables that 
evolve at the same rate. A clock valuation is vx : X — Rso. We write 0 for the 
clock valuation assigning 0 to all clocks. Given d € R>ọ, vx + d is the valuation 
s.t. (vx + d)(x) = vx(x) + d, for all x € X. Given R C X, we define the reset 
of a valuation vx, denoted by [vx]r, as follows: [vx]r(£) = 0 if z € R, and 
[vx] R(x) = vx(x) otherwise. 

We assume a set P = {p1,..., pjp} of parameters. A parameter valuation vp 
is vp : P + Q4. We denote & € {<,<,=,>,>}, <4 E {<, <}, and pe {>,>}. 
A guard g is a constraint over X U P defined by a conjunction of inequalities 
of the form x & d or x Xx p, with x € X, d € N and p € P. Given a guard g, 
we write vx = vp(g) if the expression obtained by replacing each clock z € C 
appearing in g by vx(x) and each parameter p € P appearing in g by vp(p) 
evaluates to true. 


2.1 Parametric Timed Automata 


Definition 1 (PTA). A PTA A is a tuple A = (X, L, lp, X,P,Z, E), where: (i) 
X is a finite set of actions, (ii) L is a finite set of locations, (iii) lo € L is the 
initial location, (iv) X is a finite set of clocks, (v) P is a finite set of parameters, 
(vi) T is the invariant, assigning to every LE L a guard T(¢), (vii) E is a finite 
set of edges e = (€,g,a, R, U) where ¢,¢’ € L are the source and target locations, 
a€ X, RCX is a set of clocks to be reset, and g is a guard. 


Given a parameter valuation vp and PTA A, we denote by vp(A) the non- 
parametric structure where all occurrences of a parameter p € P have been 
replaced by vp(p). Any structure vp(A) is also a timed automaton. By assuming 
a rescaling of the constants (multiplying all constants in vp( A) by their least 
common denominator), we obtain an equivalent (integer-valued) TA. 


Definition 2 (L/U-PTA). An L/U-PTA is a PTA where the set of param- 
eters is partitioned into lower-bound parameters and upper-bound parameters, 
i.e. parameters that appear only in guards and invariants in inequalities of the 
form p<, or of the form pea, respectively. 
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Definition 3 (Semantics of a PTA). Given a PTA A = (X, L, l0, X, P, Z, E), 
and a parameter valuation vp, the semantics of vp(A) is given by the timed 
transition system (TTS) (S,59,—), with: 


- S= {(£ vx) € L x RS) | vx E veTO), so = (C050), 
- — consists of the discrete and (continuous) delay transition relations: (i) 


discrete transitions: (€,vx) © (C, vk), if (£, vx), W, vk) € S, and there exists 
e = (¢,9,a,R,0’) € E, such that vg = |vy]r, and vx = ve(g), (ii) delay 
transitions: (L, vx) 4, (€,vx+d), with d E€ Rso, if Vd’ € [0, d], (£, vx +d’) € S. 


Moreover we write (£, vx) = (l, vx) for a combination of a delay and dis- 


crete transition if avg : (£, vx) E (L, v4) 5 (U, vg). 


Given a TA vp(A) with concrete semantics (S, so, —), we refer to the states 
of S as the concrete states of vp( A). A run p of vp(A) is a possibly infinite alter- 
nating sequence of concrete states of vp( A), and pairs of edges and delays, start- 
ing from the initial state so of the form so, (do, €o), S1, +++, with i =0,1,..., and 
di € Rso, e; E€ E, and (si, ei, 8:41) E —. The set of all finite runs over vp(A) is 
denoted by Runs(vp(A)). The duration of a finite run p = so, (do, €0), $1,°°* 5 Sis 
is given by duration(p) = dig<j<j_-1 dj- 

Given a state s = (,vx), we say that s is reachable in vp( A) if s is the 
last state of a run of vp( A). By extension, we say that £ is reachable; and by 
extension again, given a set T of locations, we say that T is reachable if there 
exists £ € T such that £ is reachable in vp( A). The set of all finite runs of vp(A) 
that reach T is denoted by Reach(vp(A),T). 


Minimal reachability. As the minimal time may not be an integer, but also the 
smallest value larger than an integer”, we define a minimum as either a pair in 
Q+ x {=, >} or co. The comparison operators function as follows: (c,=) < œ, 
(c, >) < œ, and (c1,>1) < (c2, +2) iff either cı < c2 or c1 = C2, 1 is = and >2 
is. 

Given a set of locations T, the minimal time reachability of T 
in vp( A), denoted by MinTimeReach(vp(A),T) = min{duration(p) | p € 
Reach(vp(A),T)}, is the minimal duration over all runs of vp(A) reaching T. 

By extension, given a PTA, we denote by MinTimePTA(A,T) the min- 
imal time reachability of T over all valuations, i.e. MinTimePTA(A,T) = 
min,, MinTimeReach(vp(A),T). As we will be interested in synthesizing the 
valuations leading to the minimal time, let us define MinTimeSynth(A,T) = 
{vp | Min TimeReach(vp(A),T) = MinTimePTA(A, T)}. 

We will also be interested in minimizing the valuation of a given parame- 
ter p; (without any notion of time) reaching a given location, and we therefore 


? Consider a TA with a transition guarded by x > 1 from £o to 44, then the minimal 
duration of runs reaching 4 is not 1 but slightly more. 

3 When we compute the minimum over a set, we actually calculate its infimum and 
combine the value with either = or > to indicate if the value is present in the set. 
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define MinParamReach(A, pi, T) = min,,{vp(p;) | Reach(vp(A),T) 4 Ø}. Simi- 
larly, we will be interested in synthesizing all valuations leading to the minimal 
valuation of p; reaching T, so let us define MinParamSynth(A,p;,T) = {vp | 
Reach(vp(A),T) 40 A vp(pi) = MinParamReach(A, pi, T)}. 


2.2 Computation Problems 


Minimal-time reachability problem: 
INPUT: A PTA A, a subset T C L of its locations. 
PROBLEM: Compute MinTimePTA(A,T). 


Minimal-time reachability synthesis problem: 
INPUT: A PTA A, a subset T C L of its locations. 
PROBLEM: Compute MinTimeSynth(A, T). 


Before addressing these problems, we will address the slightly different prob- 
lem of minimal-parameter reachability, i.e. the minimization of a parameter 
reaching a given location (independently of time). We will see in Lemma 1 that 
this problem can also give an answer to the minimal-time reachability (synthesis) 
problem. 

Minimal-parameter reachability problem: 
INPUT: A PTA A, a parameter p, a subset T C L of the locations of A. 
PROBLEM: Compute MinParamReach(A,T, p). 


Minimal-parameter reachability synthesis problem: 
INPUT: A PTA A, a parameter p, a subset T C L of the locations of A. 
PROBLEM: Synthesize MinParamSynth(A, T, p). 


2.3 Symbolic Semantics 


Let us now recall the symbolic semantics of PTAs (see e. g. [8,19]), that we will 
use to solve these problems. 


Constraints. We first define operations on constraints. A linear term over XUP is 
of the form 97) <;<}x) Git i<;j<jp] jPi +4, with z; € X, pj € P, and aj, bj, d € 
Z. A constraint C (i.e. a convex polyhedron) over X U P is a conjunction of 
inequalities of the form lt œx 0, where lt is a linear term. L denotes the false 
parameter constraint, i.e. the constraint over P containing no valuation. 

Given a parameter valuation vp, vp(C) denotes the constraint over X obtained 
by replacing each parameter p in C with vp(p). Likewise, given a clock valua- 
tion vx, Vx(vp(C)) denotes the expression obtained by replacing each clock x 
in vp(C) with vx(x). We say that vp satisfies C, denoted by vp — C, if the set of 
clock valuations satisfying vp(C) is non-empty. Given a parameter valuation vp 
and a clock valuation vx, we denote by vx|vp the valuation over X UP such that 
for all clocks x, vx|vp(a) = vx(x) and for all parameters p, vx|vp(p) = vp(p). We 


Minimal-Time Synthesis for Parametric Timed Automata 217 


use the notation vx|vp }= C to indicate that vx(vp(C)) evaluates to true. We say 
that C is satisfiable if Ivy, vp s.t.vx|up H C. 

We define the time elapsing of C, denoted by C'”, as the constraint over X 
and P obtained from C by delaying all clocks by an arbitrary amount of time. 
That is, %|\up = C7 iff Ivy : X > Ry, Id € Ry s.t. yup H C A vk = vy + d. 
Given R C X, we define the reset of C, denoted by [C]r, as the constraint 
obtained from C by resetting the clocks in R, and keeping the other clocks 
unchanged. Given a subset P’ C P of parameters, we denote by C |p the projec- 
tion of C onto P’, i.e. obtained by eliminating the clock variables and the param- 
eters in P\P (e. g. using Fourier-Motzkin). Therefore, C'|p denotes the elimina- 
tion of the clock variables only, i.e. the projection onto P. Given p, we denote 
by GetMin(C,p) the minimum of p in a form (c,>). Technically, GetMin can 
be implemented using polyhedral operations as follows: C | {p} is computed, and 
then the infimum is extracted; then the operator in {=, >} is inferred depending 
whether Cļ{p} is bounded from below using a closed or an open constraint. We 
extend GetMin to accommodate clocks, thus GetMin(C, x) returns the minimal 
clock value that x can take, while conforming to C. 

A symbolic state is a pair (£, C) where £ € Lis a location, and C its associated 
constraint, called parametric zone. 


Definition 4 (Symbolic semantics). Given a PTA A = (X, L, 0,X,P,T, E), 
the symbolic semantics of A is defined by the labelled transition system called the 
parametric zone graph PZG = (E,8,so,=>), with 


- S = {(£,C) | C CTO}, so = (60, (Areicix) zi = 0)” AT(6o)), and 
- ((@C),e,(,C)) € => ife=(69,4,R,0) € E and 
C = (((CAg|RAZ(L))” ATW) with C satisfiable. 


That is, in the parametric zone graph, nodes are symbolic states, and arcs are 
labeled by edges of the original PTA. Given s = (£, C), if ((£, C), e, (€,C’)) € >, 
we write Succ(s, e) = (V, C”). By extension, we write Succ(s) for Uee gSucc(s, e). 
Well-known results (see [19]) connect the concrete and the symbolic semantics. 


3 Computability and Intractability 


3.1 Minimal-Time Reachability 


The following result is a consequence of a monotonicity property of L/U- 
PTAs [19]. We can safely replace parameters with some constants in order to 
compute the solution to the minimal-time reachability problem, which reduces 
to the minimal-time reachability in a TA, which is PSPACE-complete [17]. All 
proofs are given in [7]. 


Proposition 1 (minimal-time reachability for L/U-PTAs). The 
minimal-time reachability problem for L/U-PTAs is PSPACE-complete. 
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Computing the minimal time for which a location is reached (Proposition 1) 
does not mean that we are able to compute exactly all valuations for which this 
location is reachable in minimal time. In fact, we show that it is not possible 
in a formalism for which the emptiness of the intersection is decidable—which 
notably rules out its representation as a finite union of polyhedra. The proof idea 
is that representing it in such a formalism would contradict the undecidability 
of the emptiness problem for (normal) PTAs. 


Proposition 2 (intractability of minimal-time reachability synthesis 
for L/U-PTAs). The solution to the minimal-time reachability synthesis prob- 
lem for L/U-PTAs cannot be represented in a formalism for which the emptiness 
of the intersection is decidable. 


3.2 Minimal-Parameter Reachability 


For the full class of PTAs, we will see that these problems are clearly out of reach: 
if it was possible to compute the solution to the minimal-parameter reachability 
or minimal-parameter reachability synthesis, then it would be possible to answer 
the reachability emptiness problem—which is undecidable in most settings [6]. 

We first show that an algorithm for the minimal-parameter synthesis prob- 
lem can be used to solve the minimal-time synthesis problem, i.e. the minimal- 
parameter synthesis problem is at least as hard as the minimal-time synthesis 
problem. 


Lemma 1 (minimal-time from minimal-parameter synthesis). An algo- 
rithm that solves the minimal-parameter synthesis problem can be used to solve 
the minimal-time synthesis problem by extending the PTA. 


Proof. Assume we are given an arbitrary PTA A, a set of target locations T, 
and a global clock 2 giosa that never resets. We construct the PTA A’ from A by 
adding a new parameter Pglobal; and for every edge (£, g,a, R, l) in A’ such that 
l € T, we replace g by GAZ global = Pglobal- Note that when a target location from 
T is reached, we have that £global = Pglodal, hence by minimizing Pgioyay we also 
minimize £global: Thus, by solving MinParamSynth(A', T, Pglobal), we effectively 
solve Min TimeSynth(A, T). 


The following result states that synthesis of the minimal-value of the param- 
eter is intractable for PTAs. 


Proposition 3 (intractability of minimal-parameter reachability for 
PTAs). The solution to the minimal-parameter reachability for PTAs cannot 
be computed in general. 


Proof (sketch). By showing that testing equality of “p = 0” against the solution 
of the minimal-parameter reachability problem for the PTA in Fig. 2 and ee is 
equivalent to solving reachability emptiness of £¢ in A—which is undecidable [3]. 
Therefore, the solution cannot be computed in general. 


The intractability of minimal-parameter reachability synthesis for PTAs will 
be implied by the upcoming Proposition 4 in a more restricted setting. 
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Fig. 2. Intractability of minimal-parameter reachability for PTAs 


Intractability of the synthesis for L/U-PTAs. The following result states that 
synthesis is intractable for L/U-PTAs. In particular, this rules out the possibility 
to represent the result using a finite union of polyhedra. 


Proposition 4 (intractability of minimal-parameter reachability syn- 
thesis for L/U-PTAs). The solution to the minimal-parameter reachability 
synthesis for L/U-PTAs cannot always be represented in a formalism for which 
the emptiness of the intersection is decidable and for which the minimization of 
a variable is computable. 


Proof. From Lemma 1 and Proposition 2. 


The minimal-parameter reachability problem remains open for L/U-PTAs 
(see Sect.7). Despite these negative results, we will define procedures that 
address not only the class of L/U-PTAs, but in fact the class of full PTAs. 
Of course, these procedures are not guaranteed to terminate. 


4 Minimal Parameter Reachability Synthesis 


We give MinParamSynth(A, T, p) in Algorithm 1. It maintains a set W of wait- 
ing symbolic states, a set P of passed states, a current optimum Opt and the 
associated optimal valuations K. While W is not empty, a state is picked in 
line 6. If it is a target state (i.e. £ € T) then the projection of its constraint 
onto p is computed, and the minimum is inferred (line 10). If that projection 
improves the known optimum, then the associated parameter valuations K are 
completely replaced by the one obtained from the current state (i. e. the projec- 
tion of C onto P). Otherwise, if C | {p} is equal to the known optimum (line 14), 
then we add (using disjunction) the associated valuations. Finally, if the current 
state is not a target state and has not been visited before, then we compute its 
successors and add them to W in lines 17 and 18. 

Note that if W is implemented as a FIFO list with “pick” the first element, 
then this algorithm is a classical BFS procedure. 

Also note that if we replace lines 10-15 with the statement K — K V C|p 
(i.e. adding the parameter valuations to K every time the algorithm reaches 
a target location), we obtain the standard synthesis algorithm EFSynth from 
e.g. [20], that synthesizes all parameter valuations for which a set of locations 
is reachable. 
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Algorithm 1: MinParamSynth(A, T, p) 


input : A PTA A with symbolic initial state so = (£9, Co), a set of target locations T, 
a parameter p. 
output : Constraint K over the parameters. 


1 W =< {so} // waiting set 
2P<-0 // passed set 
3 Opt — œ // current optimum 
4kKeL // current optimum valuations 
5 while W 4 @ do 

6 Pick s = (£, C) from W 

7 W — W \ {s} 

8 P — PU {s} 

9 if £ € T then // s is a target state 
10 Sopt — GetMin(C, p) // compute local optimum 
11 if Sop < Opt then // the optimum is strictly better 
12 Opt — Sopt // we found a new best optimum: replace it 
13 K — Clp // completely replace the found valuations 
14 else if sop, = Opt then // the optimum is equal to the one known 
15 K-KVC\p // add the found valuations 
16 else // otherwise explore successors 
17 for each s’ € Succ(s) do 

18 ifs’ ¢ W^s' ¢P then W- WU {s} 


19 return K 


x:=0 


Fig. 3. PTA exemplifying Algorithm 1. 


Example 2. Consider the PTA A in Fig. 3, and run MinParamSynth(A, {L3}, p1). 
The initial state is sı = (¢1,x > 0) (we omit the trivial constraints p; > 0). Its 
successors S2 = (l3, x > 2Ap; > 2) and s3 = (¢2,% > OAp2 > 1) are added to W. 
Pick sz from W: it is a target, and therefore GetMin(C 2, p) is computed, which 
gives (2, >). Since (2, >) < co, we found a new minimum, and K becomes C2|p, 
i.e. pı > 2. Pick s3 from W: it is not a target, therefore we compute its successors 
s4 = (€3,4 > 2Apı =2A1 < po < 2) and s5 = (3,4 > 2Ap1 = p3 = 2Ap2 > 1). 
Pick sq: it is a target, with GetMin(C4, p1) = (2, =). As (2, =) < (2, >), we found 
a new minimum, and K is replaced with C4|p, i.e. pp = 2A1 < pọ < 2. Pick 
s5: it is a target, with GetMin(C4, p1) = (2,=). As (2,=) = (2,=), we found an 
equally good minimum, and K is improved with Cs|p, giving a new K equal to 
(pı =2A1< pe <2)V (pı = p3 = 2^ p2 > 1). As W = Í, K is returned. 


Algorithm 1 is a semi-algorithm; if it terminates with result K, then K is a 
solution for the MinParamSynth problem. Correctness follows from the fact that 
the algorithm explores the entire parametric zone graph, except for successors of 
target states (from [19,20] we have that successors of a symbolic state can only 
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restrict the parameter constraint, hence we cannot improve). Furthermore, the 
minimum is tracked and updated whenever a target state is reached. 

We show that synthesis can effectively be achieved for PTAs with a single 
clock, a decidable subclass. 


Proposition 5 (synthesis for one-clock PTAs). The solution to the 
minimal-parameter reachability synthesis can be computed for 1-clock PTAs 
using a finite union of polyhedra. 


5 Minimal Time Reachability Synthesis 


For minimal-time reachability and synthesis, we assume that the PTA contains 
a global clock £global that is never reset. Otherwise, we extend the PTA by 
simply adding a ‘dummy’ clock Zgiopai without any associated guards, invariants 
or resets. 


Algorithm 2: MinTimeSynth(A, T, £ global) 


input : A PTA A with symbolic initial state so = (Z0, Co), a set of target locations T, 
a global clock that never resets £ global- 
output : Minimal time Topi constraint K over the parameters. 


1 Q+ {(0,so0)} // priority queue ordered by time 
2P-90 // passed set 
3 Kel // current optimum parameter valuations 
4 Topt — œ // current optimum time 
5 while Q # do 

6 (t,s = (£, C)) = Q.Pop() // take head of the queue and remove it 
7 P PU{s} 

8 if t > Top: then break 

9 else if £ € T then // when s is a target state and t < Topt 
10 K e KV (C ^A &otoba = t) lP // valuations for which t = Topt 
11 else // otherwise explore successors 
12 for each s’ € Succ(s) do 

13 if s! € Q Vs’ € P then continue // ignore seen states 
14 t — GetMin(s’.C, £ global) // get minimal time of s’.C 
15 if t < Topt then // only add states not exceeding Tp; 
16 ifs’ LETAt < Top then 

17 | Topt — t // new lower time to target 
18 Q.Push((t’, s’)) // add to the priority queue 


19 return (Top, K) 


We give MinTimeSynth(A, T, £globai) in Algorithm 2. We maintain a priority 
queue Q of waiting symbolic states and order these by their minimal time (for 
the initial state this is 0). We further maintain a set P of passed states, a current 
time optimum Top: (initially oo), and the associated optimal valuations K. We 
first explain the synthesis algorithm and then the reachability variant. 
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Minimal-time reachability synthesis. While Q is not empty, the state with the 
lowest associated minimal time t is popped from the head of the queue (line 6). 
If this time t is larger than Top: (line 8), then this also holds for all remaining 
states in Q. Also all successor states from s (or successors of any state from Q) 
cannot have a better minimal time, thus we can end the algorithm. 

Otherwise, if s is a target state, we assume that t £ Tp; and thus t = Top: 
(we guarantee this property when pushing states to the queue). Before adding the 
parameter valuations to K in line 10, we intersect the constraint with £global = t 
in case the clock value depends on parameters, e.g. if C is £global = p.t 

If s is not a target state, then we consider its successors in lines 12-18. We 
ignore states that have been visited before (line 13), and compute the minimal 
time of s’ in line 14. We compare t’ with Top: in line 15. All successor states for 
which t exceeds Top: are ignored, as they cannot improve the result. 

If s' isa target state and t < Topt, then we update Topt. Finally, the successor 
state is pushed to the priority queue in line 18. Note that we preserve the property 
that t £ Topt for the states in Q. 


Minimal-time reachability. When we are interested in just a single parameter 
valuation, we may end the algorithm early. The algorithm can be terminated as 
soon as it reaches line 10. We can assert at this point that Topt will not decrease 
any further, since all remaining unexplored states have a minimal time that is 
larger than or equal to Topt- 

Algorithm 2 is a semi-algorithm; if it terminates with result (Topt, K), then 
K is a solution for the MinTimeSynth problem. Correctness follows from the 
fact that the algorithm explores exactly all symbolic states in the parametric 
zone graph that can be reached in at most Top: time, except for successors of 
target states. Note (again) that successors of a symbolic state can only restrict 
the parameter constraint. Furthermore, Typ; is checked and updated for every 
encountered successor to ensure that the first time a target state is popped from 
the priority queue Q, it is reached in T,,; time (after which T,,; never changes). 


6 Experiments 


We implemented all our algorithms in the IMITATOR tool [9] and compared their 
performance with the standard (non-minimization) EFSynth parameter synthesis 
algorithm from [20]. For the experiments, we are interested in analysing the 
performance (in the form of computation time) of each algorithm, and comparing 
that with the performance of standard synthesis. 


Benchmark models. We collected PTA models and properties from the IMITA- 
TOR benchmarks library [5] which contains numerous benchmark models from 


4 In case t is of the form (c, >) with c € Q4, then the intersection of C with the linear 
term global = t would result in L, as the exact value t is not part of the constraint. 
In the implementation, we intersect C with £globai = t + €, for a small € > 0. 
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scientific and industrial domains. We selected all models with reachability prop- 
erties and extended these to include: (1) a new clock variable that represents 
the global time £global, i.e. a clock that does not reset, and (2) a new parame- 
ter Pglobal along with the linear term Zglobal = Pglobal for every transition that 
targets a goal location, to ensure that when minimizing Pgloba] we effectively 
minimize %giobal- In total we have 68 models, and for every experiment we used 
the extended model that includes both the global time clock 2 io5q; and the 
corresponding parameter Pglobal- 


Subsumption. For each algorithm that we consider, it is possible to reduce the 
search space with the following two reduction techniques: 


— State inclusion [18]: Given two symbolic states sı = (41, C1) and s2 = (£2, C2) 
with 41 = ¢2, we say that sı is included in s2 if all parameter valuations for 
sı are also contained in sg, e.g. Cı is p > 5 and Cy is p > 2. We may 
then conclude that sı is redundant and can be ignored. This check can be 
performed in the successor computation (Succ) to remove included states, 
without altering correctness for minimal-time (or parameter) synthesis. 

— State merging [10]: Two states sı = (41, C1) and s2 = (£2, C2) can be merged if 
lı = 2 and C1 UC, is a convex polyhedron. The resulting state (41, C1 U C2) 
replaces sı and sg and is an over-approximation of both states. However, 
reachable locations, minimality, and executable actions are preserved. 


State inclusion is a relatively inexpensive computational task and preliminary 
results showed that it caused the algorithm to perform equally fast or faster than 
without the check. Checking for merging is however a computationally expensive 
procedure and thus should not be performed for every newly found state. For all 
BFS-based algorithms (standard synthesis and minimal-parameter synthesis) we 
merge every BFS layer. For the minimal-time synthesis algorithm, we empirically 
studied various merging heuristics and found that merging every ten iterations 
of the algorithm yielded the best results. We assume that both the inclusion 
and merging state-space reductions are used in all experiments (all computation 
times include the overhead the reductions), unless otherwise mentioned. 


Run configurations. For the experiments we used the following configurations: 


— MTReach: Minimal-time reachability, 

— MTSynth: Minimal-time synthesis, 

— MTSynth-noRed: Minimal time synthesis, without reductions, 
— MPReach: Minimal-parameter reachability (of Pglobal), and 

— MPSynth: Minimal-parameter synthesis (of Pglobal), and 

— EFSynth: Classical reachability synthesis. 


Experimental setup. We performed all our experiments on an Intel® Core™ i7- 
4710MQ processor with 2.50 GHz and 7.4GiB memory, using a single thread. The 
six run configurations were executed on each benchmark model, with a timeout 
of 3600s. All our models, results, and information on how to reproduce the 
results are available on https://github.com/utwente-fmt /OptTime-TACAS19. 
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Results. The results of our experiments are displayed in Fig. 4. 

MTSynth vs EFSynth. We observe that for most of the models MTSynth 
clearly outperforms EFSynth. This is to be expected since all states that take 
more than the minimal time can be ignored. Note that the experiments that 
appear on a vertical line between 0.1s < x < 1s are a scaled-up variant of the 
same model, indicating that this scaling does not affect minimal-time synthesis. 
Finally, the model plotted at (1346, 52) does not heavily modify the clocks. As a 
consequence, MTSynth has to explore most of the state space while continuously 
having to extract the time constraints, making it inefficient. 
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Fig. 4. Scatterplot comparisons of different algorithm configurations. The marks on 
the red dashed line did not finish computing within the allowed time (3600s). (Color 
figure online) 


MPSynth vs EFSynth. We can see that MPSynth performs more similar to 
EFSynth than MTSynth, which is to be expected as the algorithms differ less. 
Still, MPSynth significantly outperforms EFSynth. This is also because fewer 
states have to be explored to guarantee optimality (once a parameter exceeds 
the minimal value, all its successors can be ignored). 
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MTSynth vs MPSynth. Here, we find that MTSynth outperforms MPSynth, 
similar to the comparison with EFSynth. The results also show a second scalable 
model around (0.003,10) and we see that MPSynth is able to solve the ‘bad 
performing model’ for MTSynth as quickly as EFSynth. Still, we can conclude 
that the minimal-time synthesis problem is in general more efficiently solved 
with the MTSynth algorithm. 

MTSynth vs MTSynth-noRed. Here we can see the advantage of using the 
inclusion and merging reductions to reduce the search space. For most models 
there is a non-existent to slight improvement, but for others it makes a large dif- 
ference. While there is some computational overhead in performing these reduc- 
tions, this overhead is not significant enough to outweigh their benefits. 

MTReach vs MTSynth. With MTReach we expect faster execution times as 
the algorithm terminates once a parameter valuation is found. The experiments 
show that this is indeed the case (mostly visible from the timeout line). How- 
ever, we also observe that for quite a few models the difference is not as signifi- 
cant, implying that synthesis results can often be quickly obtained once a single 
minimal-time valuation is found. 

MPReach vs MPSynth. Here we also expect MPReach to be faster than its 
synthesis variant. While it does quickly solve six instances for which MPSynth 
timed out, other than that there is no real performance gain. We also argue here 
that synthesis is obtained quickly when a minimal parameter bound is found. 
Of course we are effectively computing a minimal global time, so results may 
change when a different parameter is minimized. 


7 Conclusion 


We have designed and implemented several algorithms to solve the minimal-time 
parameter synthesis and related problems for PTAs. From our experiments we 
observed in general that minimal-time reachability synthesis is in fact faster to 
compute compared to standard synthesis. We further show that synthesis while 
minimizing a parameter is also more efficient, and that existing search space 
reductions apply well to our algorithms. 

Aside from the performance improvement, we deem minimal-time reachabil- 
ity synthesis to be useful in practice. It allows for evaluating which parameter 
valuations guarantee that the goal is reached in minimal time. We consider it 
particularly valuable when reasoning about real-time systems. 

On the theoretical side, we did not address the minimal-parameter reacha- 
bility problem for L/U-PTAs (we only showed intractability of the synthesis). 
While finding the minimal valuation of a given lower-bound parameter is trivial 
(the answer is 0 iff the target location is reachable), finding the minimum of an 
upper-bound parameter boils down to reachability-synthesis for U-PTAs, a prob- 
lem that remains open in general (it is only solvable for integer-valued parame- 
ters [15]), as well as to shrinking timed automata [23], but with 0-coefficients in 
the shrinking vector—not allowed in [23]. 
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A direction for future work is to improve performance by exploiting paral- 
lelism. Parallel random search could significantly speed up the computation pro- 
cess, as demonstrated for timed automata [24,25]. Another interesting research 
direction is to look at maximizing the time to reach the target, or to minimize 
the upper-bound time to reach the target (e.g. for minimizing the worst-case 
response-time in real-time systems); a preliminary study suggests that the latter 
problem is significantly more complex than the minimal-time synthesis problem. 
One may also study other quantitative criteria, e. g. minimizing cost parameters. 
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Abstract. Many problems in reactive synthesis are stated using two 
formulas—an environment assumption and a system guarantee—and 
ask for an implementation that satisfies the guarantee in environments 
that satisfy their assumption. Reactive synthesis tools often produce 
strategies that formally satisfy such specifications by actively preventing 
an environment assumption from holding. While formally correct, such 
strategies do not capture the intention of the designer. We introduce an 
additional requirement in reactive synthesis, non-conflictingness, which 
asks that a system strategy should always allow the environment to fulfill 
its liveness requirements. We give an algorithm for solving GR(1) syn- 
thesis that produces non-conflicting strategies. Our algorithm is given by 
a 4-nested fixed point in the p-calculus, in contrast to the usual 3-nested 
fixed point for GR(1). Our algorithm ensures that, in every environment 
that satisfies its assumptions on its own, traces of the resulting imple- 
mentation satisfy both the assumptions and the guarantees. In addition, 
the asymptotic complexity of our algorithm is the same as that of the 
usual GR(1) solution. We have implemented our algorithm and show 
how its performance compares to the usual GR(1) synthesis algorithm. 


1 Introduction 


Reactive synthesis from temporal logic specifications provides a methodology to 
automatically construct a system implementation from a declarative specifica- 
tion of correctness. Typically, reactive synthesis starts with a set of requirements 
on the system and a set of assumptions about the environment. The objective of 
the synthesis tool is to construct an implementation that ensures all guarantees 
are met in every environment that satisfies all the assumptions; formally, the 
synthesis objective is an implication A = G. In many synthesis problems, the 
system can actively influence whether an environment satisfies its assumptions. 
In such cases, an implementation that prevents the environment from satisfying 
its assumptions is considered correct for the specification: since the antecedent 
of the implication A = G does not hold, the property is satisfied. 
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Fig. 1. Pictorial representation of a desired strategy for a robot (square) moving in a 
maze in presence of a moving obstacle (circle). Obstacle and robot start in the lower 
left and right corner, can move at most one step at a time (to non-occupied cells) and 
cells that they should visit infinitely often are indicated in light and dark gray (see qo), 
respectively. Nodes with self-loops (q¢{1,3,6,8}) can be repeated finitely often with the 
obstacle located at one of the dotted positions. 


Such implementations satisfy the letter of the specification but not its intent. 
Moreover, assumption-violating implementations are not a theoretical curiosity 
but are regularly produced by synthesis tools such as slugs [14]. In recent years, 
a lot of research has thus focused on how to model environment assumptions [2, 
4,5,11,18], so that assumption-violating implementations are ruled out. Existing 
research either removes the “zero sum” assumption on the game by introducing 
different levels of co-operation [5], by introducing equilibrium notions inspired by 
non-zero sum games [7,16,20], or by introducing richer quantitative objectives 
on top of the temporal specifications [1,3]. 


Contribution. In this paper, we take an alternative approach. We consider the 
setting of GR(1) specifications, where assumptions and guarantees are both con- 
junctions of safety and Btichi properties [6]. GR(1) has emerged as an expressive 
specification formalism [17,24,28] and, unlike full linear temporal logic, synthesis 
for GR(1) can be implemented in time quadratic in the state/transition space. 
In our approach, the environment is assumed to satisfy its assumptions provided 
the system does not prevent this. Conversely, the system is required to pick a 
strategy that ensures the guarantees whenever the assumptions are satisfied, but 
additionally ensures non-conflictingness: along each finite prefix of a play accord- 
ing to the strategy, there exists the persistent possibility for the environment to 
play such that its liveness assumptions will be met. 

Our main contribution is to show a p-calculus characterization of winning 
states (and winning strategies) that rules out system strategies that are winning 
by preventing the environment from fulfilling its assumptions. Specifically, we 
provide a 4-nested fixed point that characterizes winning states and strategies 
that are non-conflicting and ensure all guarantees are met if all the assump- 
tions are satisfied. Thus, if the environment promises to satisfy its assumption if 
allowed, the resulting strategy ensures both the assumption and the guarantee. 

Our algorithm does not introduce new notions of winning, or new logics or 
winning conditions. Moreover, since p-calculus formulas with d alternations can 
be computed in O(n!4¢/2!) time [8,26], the O(n?) asymptotic complexity for the 
new symbolic algorithm is the same as the standard GR(1) algorithm. 


Motivating Example. Consider a small two-dimensional maze with 3 x 2 cells 
as depicted in Fig. 1, state go. A robot (square) and an obstacle (circle) are 
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Fig. 2. Pictorial representation of the GR(1) winning strategy synthesized by slugs 
for the robot (square) in the game described in Fig. 1. 


located in this maze and can move at most one step at a time to non-occupied 
cells. There is a wall between the lower and upper left cell and the lower and 
upper right cell. The interaction between the robot and the object is as follows: 
first the environment chooses where to move the obstacle to, and, after observing 
the new location of the obstacle, the robot chooses where to move. 

Our objective is to synthesize a strategy for the robot s.t. it visits both the 
upper left and the lower right corner of the maze (indicated in dark gray in 
Fig. 1, state qo) infinitely often. Due to the walls in the maze the robot needs to 
cross the two white middle cells infinitely often to fulfill this task. If we assume 
an arbitrary, adversarial behavior of the environment (e.g., placing the obstacle 
in one white cell and never moving it again) this desired robot behavior cannot 
be enforced. We therefore assume that the obstacle is actually another robot 
that is required to visit the lower left and the upper right corner of the maze 
(indicated in light gray in Fig. 1, state qo) infinitely often. While we do not know 
the precise strategy of the other robot (i.e., the obstacle), its liveness assumption 
is enough to infer that the obstacle will always eventually free the white cells. 
Under this assumption the considered synthesis problem has a solution. 

Let us first discuss one intuitive strategy for the robot in this scenario, as 
depicted in Fig. 1. We start in qo with the obstacle (circle) located in the lower 
left corner and the robot (square) located in the lower right corner. Recall that 
the obstacle will eventually move towards the upper right corner. The robot can 
therefore wait until it does so, indicated by qı. Here, the dotted circles denote 
possible locations of the obstacle during the (finitely many) repetitions of qı by 
following its self loop. Whenever the obstacle moves to the upper part of the 
maze, the robot moves into the middle part (q2). Now it waits until the obstacle 
reaches its goal in the upper right, which is ensured to happen after a finite 
number of visits to q3. When the obstacle reaches the upper right, the robot 
moves up as well (q4). Now the robot can freely move to its goal in the upper 
left (q5). This process symmetrically repeats for moving back to the respective 
goals in the lower part of the maze (ge to gg and then back to qo). With this 
strategy, the interaction between environment and system goes on for infinitely 
many cycles and the robot fulfills its specification. 

The outlined synthesis problem can be formalized as a two player game with 
GR(1) winning condition. When solving this synthesis problem using the tool 
slugs [14], we obtain the strategy depicted in Fig.2 (not the desired one in 
Fig. 1). The initial state, denoted by qo is the same as in Fig. 1 and if the envi- 
ronment moves the obstacle into the middle passage (q1) the robot reacts as 


232 R. Majumdar et al. 


before; it waits until the object eventually proceeds to the upper part of the 
maze (q2). However, after this happens the robot takes the chance to simply 
move to the lower left cell of the maze and stays there forever (q3). By this, 
the robot prevents the environment from fulfilling its objective. Similarly, if the 
obstacle does not immediately start moving in qo, the robot takes the chance to 
place itself in the middle passage and stays there forever (q4). This obviously 
prevents the environment from fulfilling its liveness properties. 

In contrast, when using our new algorithm to solve the given synthesis prob- 
lem, we obtain the strategy given in Fig. 1, which satisfies the guarantees while 
allowing the environment assumptions to be satisfied. 


Related Work. Our algorithm is inspired by supervisory controller synthesis 
for non-terminating processes [23,27], resulting in a fixed-point algorithm over a 
Rabin-Biichi automaton. This algorithm has been simplified for two interacting 
Biichi automata in [22] without proof. We adapt this algorithm to GR(1) games 
and provide a new, self-contained proof in the framework of two-player games, 
which is distinct from the supervisory controller synthesis setting (see [13,25] for 
a recent comparison of both frameworks). 

The problem of correctly handling assumptions in synthesis has recently 
gained attention in the reactive synthesis community [4]. As our work does 
not assume precise knowledge about the environment strategy (or the ability 
to impose the latter), it is distinct from cooperative approaches such as assume- 
guarantee [9] or rational synthesis [16]. It is closest related to obliging games [10], 
cooperative reactive synthesis [5], and assume-admissible synthesis [7]. Obliging 
games [10] incorporate a similar notion of non-conflictingness as our work, but 
do not condition winning of the system on the environment fulfilling the assump- 
tions. This makes obliging games harder to win. Cooperative reactive synthesis 
[5] tries to find a winning strategy enforcing AN G. If this specification is not 
realizable, it is relaxed and the obtained system strategy enforces the guaran- 
tees if the environment cooperates “in the right way”. Instead, our work always 
assumes the same form of cooperation; coinciding with just one cooperation 
lever in [5]. Assume-admissible synthesis [7] for two players results in two indi- 
vidual synthesis problems. Given that both have a solution, only implementing 
the system strategy ensures that the game will be won if the environment plays 
admissible. This is comparable to the view taken in this paper, however, assum- 
ing that the environment plays admissible is stronger then our assumption on 
an environment attaining its liveness properties if not prevented from doing so. 
Moreover, we only need so solve one synthesis problem, instead of two. However, 
it should be noted that [5,7,10] handle w-regular assumptions and guarantees. 
We focus on the practically important GR(1) fragment and our method better 
leverages the computational benefits for this fragment. 

All proofs of our results and additional examples can be found in the extended 
version [21]. We further acknowledge that the same problem was independently 
solved in the context of reactive robot mission plans [12] which was brought to 
our attention only shortly before the final submission of this paper. 
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2 Two Player Games and the Synthesis Problem 
2.1 Two Player Games 


Formal Languages. Let X be a finite alphabet. We write ©*, X+, and XY” for 
the sets of finite words, non-empty finite words, and infinite words over X. We 
write w < v (resp., w < v) if w is a prefix of v (resp., a strict prefix of v). The 
set of all prefixes of a word w € X® is denoted pfx(w) C X*. For L C &*, we 
have L C pfx(L). For L C X” we denote by £ its complement ¥* \ £. 


Game Graphs and Strategies. A two player game graph H = 
(Q°, Q', 6°, 6+, qo) consists of two finite disjoint state sets Q? and Qt, two tran- 
sition functions 5° : Q? > 2@’ and 5! : Q! — 29°, and an initial state go € Q°. 
We write Q = Q? U Q!. Given a game graph H, a strategy for player 0 is 
a function f° : (Q°Q")*Q° — Q!; it is memoryless if f°(vq°) = f'(q°) for 
all v € (Q°Q')* and all q? € QP. A strategy f! : (Q°Q')* — Q? for player 
1 is defined analogously. The infinite sequence m € (Q°Q!)* is called a play 
over H if 7(0) = qo and for all k € N holds that m(2k + 1) € 6°((2k)) and 
m(2k+2) € 6'(1(2k+1)); m is compliant with f° and/or f! if additionally holds 
that f°(z|[0,24) = m(2k + 1) and/or f'(r|io,2k+1]) = 7(2k + 2). We denote by 
L(A, f°), £(H, f+) and L(H, f°, f+) the set of plays over H compliant with f°, 
f', and both f° and ft, respectively. 


Winning Conditions. We consider winning conditions defined over sets of 
states of a given game graph H. Given F C Q, we say a play m satisfies 
the Btichi condition F if Inf(r)NF # Ø, where Inf(z7) = {q € Q | m(k) = 
q for infinitely many k € N}. Given a set F = {Fi,..., Fm}, where each F; C Q, 
we say a play m satisfies the generalized Büchi condition F if Inf(r) QF; 4 0 
for each i € [1; m]. We additionally consider generalized reactivity winning con- 
ditions with rank 1 (GR(1) winning conditions in short). Given two general- 
ized Biichi conditions F°? = {F?},..., F? } and F! = {Fi,..., F1}, a play 7 
satisfies the GR(1) condition if either Inf(7) NF? = Ø for some i € [1;m] or 
Inf(7) OF; 4 9 for each j € [1;m]. That is, whenever the play satisfies F°, it 
also satisfies F1. We use the tuples (H, F), (H,#) and (H, F°, F!) to denote a 
Büchi, generalized Biichi and GR(1) game over H, respectively, and collect all 
winning plays in these games in the sets £(H, F), L(H, F) and L(H, F°, F'). A 
strategy f! is winning for player | in a Biichi, generalized Biichi, or GR(1) game, 
if C(H, f') is contained in the respective set of winning plays. 


Set Transformers on Games. Given a game graph H, we define the existen- 
tial, universal, and player 0-, and player 1-controllable pre-operators. Let P C Q. 


Pre? (P) = {4° € QS) NP ZO} U {a €Q'|5'(q')O PAO}, and (1) 
Pre" (P) = {9° € Q°|6°(q°) E P} U {q € Q"|5"(q") CP}, (2) 


Pre?( =4{q° E Q? |8. (q? )APÆØ}U{q' cQ! [8t (q! )C P}, and (3) 
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Pre'(P) = {q° € Q°|6°(@°) C P}Ufg eQ'aq)nP#o}. (4) 


Observe that Q \ Pre” (P) = Pre” (Q \ P) and Q \ Pre! (P) = Pre? (Q \ P). 
We combine the operators in (1)-(4) to define a conditional predecessor 
CondPre and its dual CondPre for sets P, P’ C Q by 


CondPre(P, P’) :=Pre*(P)Pre'(PU P’), and (5) 
CondPre(P, P’) :=Pre”(P) U Pre? (P A P’). (6) 


We see that Q \ CondPre(P, P’) = CondPre(Q \ P, Q \ P’). 


p-Calculus. We use the p-calculus as a convenient logical notation used to 
define a symbolic algorithm (i-e., an algorithm that manipulates sets of states 
rather then individual states) for computing a set of states with a particular 
property over a given game graph H. The formulas of the p-calculus, interpreted 
over a two-player game graph H, are given by the grammar 


pr=p|X|eUyp| gina | pre(y) | uX.p|vX.p 


where p ranges over subsets of Q, X ranges over a set of formal variables, 
pre € {Preq, Pre”, Pre?, Pret, CondPre, CondPre} ranges over set transformers, 
and u and v denote, respectively, the least and greatest fixpoint of the functional 
defined as X ++ y(X). Since the operations U, N, and the set transformers pre 
are all monotonic, the fixpoints are guaranteed to exist. A -calculus formula 
evaluates to a set of states over H, and the set can be computed by induction 
over the structure of the formula, where the fixpoints are evaluated by iteration. 
We omit the (standard) semantics of formulas [19]. 


2.2 The Considered Synthesis Problem 


The GR(1) synthesis problem asks to synthesize a winning strategy for the 
system player (player 1) for a given GR(1) game (H,¥4,Fg) or determine 
that no such strategy exists. This can be equivalently represented in terms of 
w-languages, by asking for a system strategy f! over H s.t. 


Ø al L(H, f`) C L(H, Fa) U LG, Fg). 


That is, the system wins on plays m € £L(H, ft) if either m ¢ L(H, FA) or 
m E€ L(H,F4~)OL(H, Fg). The only mechanism to ensure that sufficiently many 
computations will result from f! is the usage of the environment input, which 
enforces a minimal branching structure. However, the system could still win this 
game by falsifying the assumptions; i.e., by generating plays 7 ¢ L(H, F4) that 
prevent the environment from fulfilling its liveness properties. 

We suggest an alternative view to the usage of the assumptions on the envi- 
ronment F4 in a GR(1) game. The condition F4 can be interpreted abstractly 
as modeling an underlying mechanism that ensures that the environment player 
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(player 0) generates only inputs (possibly in response to observed outputs) that 
conform with the given assumption. In this context, we would like to ensure 
that the system (player 1) allows the environment, as much as possible, to ful- 
fill its liveness and only restricts the environment behavior if needed to enforce 
the guarantees. We achieve this by forcing the system player to ensure that the 
environment is always able to play such that it fulfills its liveness, i.e. 


pix(L(H, f*)) = pix(C(H, f') N L(A, Fa). 


As the D-inclusion trivially holds, the constraint is given by the C-inclusion. 
Intuitively, the latter holds if every finite play a compliant with f! over H can 
be extended (by a suitable environment strategy) to an infinite play 7 compliant 
with f! that fulfills the environment liveness assumptions. It is easy to see that 
not every solution to the GR(1) game (H, F4, Fg) (in the classical sense) supplies 
this additional requirement. We therefore propose to synthesize a system strategy 
f! with the above properties, as summarized in the following problem statement. 


Problem 1. Given a GR(1) game (H, F4, Fg) synthesize a system strategy f! 
s.t. 0 4 £(H, f*) C LUH, Fa) ULE, Fy), (7a) 


and pfx(L(H, f*)) = pfx(L(H, f') N L(A, Fa)) (7b) 


both hold, or verify that no such system strategy exists. 


Problem 1 asks for a strategy f! s.t. every play m compliant with f! over 
H fulfills the system guarantees, i.e., m E€ L(H, Fg), if the environment ful- 
fills its liveness properties, i.e., if m € L(H,F.~) (from (7a)), while the lat- 
ter always remains possible (by a suitably playing environment) due to (7b). 
Inspired by algorithms solving the supervisory controller synthesis problem for 
non-terminating processes [23,27], we propose a solution to Problem 1 in terms 
of a vectorized 4-nested fixed-point in the remaining part of this paper. We show 
that Problem 1 can be solved by a finite-memory strategy, if a solution exists. 

We note that (7b) is not a linear time but a branching time property and 
can therefore not be “compiled away” into a different GR(1) or even w-regular 
objective. Satisfaction of (7b) requires checking whether the set F4 remains 
reachable from any reachable state in the game graph realizing £(H, f')'. 


3 Algorithmic Solution for Singleton Winning Conditions 


We first consider the GR(1) game (H, FA, Fg) with singleton winning conditions 
Fa = {F14} and Fg = {Fo}, i.e., n = m = 1. It is well known that a system 
winning strategy f! for this game can be synthesized by solving a three color 
parity game over H. This can be expressed by the p-calculus formula (see [15]) 


p3 :=vVZ . Y . vX . (Fg N Pre (Z)) U Pret (Y) U(Q\ FaMPre'(X)). (8) 


1 Tt can indeed be expressed by the CTL* formula AGEFF4 (see [13], Sect. 3.3.2). 
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It follows that qo € lys] if and only if the synthesis problem has a solution 
and the winning strategy f! is obtained from a ranking argument over the sets 
computed during the evaluation of (8). 

To obtain a system strategy f! solving Problem1 instead, we propose to 
extend (8) to a 4-nested fixed-point expressed by the p-calculus formula 


p4 =vVvZ. uY .vX . uW. 9 
(Fg MPre'(Z)) U Pre'(Y) U ((Q\ Fa) A CondPre(W, X \ Fa)). (9) 


Compared to (8) this adds an inner-most largest fixed-point and substitutes 
the last controllable pre-operator by the conditional one. Intuitively, this distin- 
guishes between states from which player 1 can force visiting Fg and states from 
which player 1 can force avoiding F4. This is in contrast to (8) and allows to 
exclude strategies that allow player 1 to win by falsifying the assumptions. 

The remainder of this section shows that qo € [va] if and only if Problem 1 
has a solution and the winning strategy f+ fulfilling (7a) and (7b) can be obtained 
from a ranking argument over the sets computed during the evaluation of (9). 


Soundness 

We prove soundness of (9) by showing that every state q € [p4] is winning for 
the system player. In view of Problem 1 this requires to show that there exists 
a system strategy f! s.t. all plays starting in a state q € [ya] and evolving in 
accordance to f+ result in an infinite play that fulfills (7a) and (7b). 

We start by defining f! from a ranking argument over the iterations of (9). 
Consider the last iteration of the fixed-point in (9) over Z. As (9) terminates 
after this iteration we have Z = Z% = [ya]. Assume that the fixed point over Y 
is reached after k iterations. If Y* is the set obtained after the i-th iteration, we 
have that Z® =U*_, Y* with Y’ C y*+!, Y° = Ø and Y* = Z”. Furthermore, 
let X’ = Y* denote the fixed-point of the iteration over X resulting in Y’ and 
denote by Wi the set obtained in the jth iteration over W performed while using 


the value X* for X and Y*~! for Y. Then it holds that Yt = X* = Uko Wi 
with Wi Cc Wiji Wé = 0 and Wi = Y' for all i € [0; k]. 
Using these sets, we define a ranking for every state q E€ Z” s.t. 


rank(q) = (i, j) iff g € (Y*\ Y*"*) A (W4 \ Wi) fori,g>0. (10) 
We order ranks lexicographically. It further holds that (see [21]) 
qED & rank(q) = (1,1) > qE Fg AZ” (11a) 
qe E’ & rank(qg)=(4,1)Ai>1 6 qeE(Fa\ Fg) nZ” (11b) 
GER) & rank(g)=(i,j)Aj>1 & g€(Z™\(FaUFo)), (110) 
where D, E* and Ri denote the sets added to the winning state set by the first, 
second and third term of (9), respectively, in the corresponding iteration. 


Figure 3 (left) shows a schematic representation of this construction for an 
example with k 3, 4 4, ly 2 and l3 = 3. The set D = Fg N Z% is 
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Fig. 3. Schematic representation of the ranking defined in (10) (left) and in (16) (right). 
Diamond, ellipses and rectangles represent the sets D, E’ and Ri, while blue, green 
and red indicate the sets Yt, Y? \ Y' and Y?’ \ Y? (annotated by °/ for the right 
figure). Labels (i, j) and (a, i, b, j) indicate that all states q associated with this set fulfill 
rank(q) = (i, j) and “’rank(q) = (i, j), respectively. Solid, colored arcs indicate system- 
enforceable moves, dotted arcs indicate existence of environment or system transitions 
and dashed arcs indicate possible existence of environment transitions. (Color figure 
online) 


represented by the diamond at the top where the label (1,1) denotes the asso- 
ciated rank (see (11a)). The ellipses represent the sets Et C (F4 \ Fg)N Z9, 
where the corresponding i > 1 is indicated by the associated rank (7,1). Due to 
the use of the controllable pre-operator in the first and second term of (9), it is 
ensured that progress out of D and E’ can be enforced by the system, indicated 
by the solid arrows. This is in contrast to all states in Ri C Z% \ F4 \ Fg, which 
are represented by the rectangular shapes in Fig. 3 (left). These states allow the 
environment to increase the ranking (dashed lines) as long as Z°% \ F4 \ Fg is not 
left and there exists a possible move to decrease the j-rank (dotted lines). While 
this does not strictly enforce progress, we see that whenever the environment 
plays such that states in F4 (i.e., the ellipses) are visited infinitely often (i.e., the 
environment fulfills its assumptions), the system can enforce progress w.r.t. the 
defined ranking and states in Fg (i.e., the diamond shape) is eventually visited. 
The system is restricted to take the existing solid or dotted transitions in 
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Fig. 3 (left). With this, it is easy to see that the constructed strategy is winning 
if the environment fulfills its assumptions, i.e., (7a) holds. However, to ensure 
that (7b) also holds, we need an additional requirement. This is necessary as 
the used construction also allows plays to cycle through the blue region of Fig. 3 
(left) only, and by this not surely visiting states in F4 infinitely often. However, 
if £(H, Fg) C L(H, F4) we see that (7b) holds as well. It should be noted that 
the latter is a sufficient condition which can be easily checked symbolically on 
the problem instance but not a necessary one. 

Based on the ranking in (10) we define a memory-less system strategy f+ : 
Q! N Z® — Q° C ô! s.t. the rank is always decreased, i.e., 


fai ee < rank(q), | rank(q) > (1,1) . (12) 


q EZ”, otherwise 
The next theorem shows that this strategy indeed solves Problem 1. 


Theorem 1. Let (H,F4, Fg) be a GR(1) game with singleton winning condi- 
tions Fa = {F4} and Fg = {Fg}. Suppose f! is the system strategy in (12) 
based on the ranking in (10). Then it holds for all q € [ya] that? 


LAS) C L(A, Fa) U La(H, Fo), (13a) 
L4o(H, fF!) O L4(H, Fe) #0, and (13b) 
Lq(H, Fg) CLq(H, FA) = pix(Lq(H, f*)) = pix(Ly(H, ft) OLq(H, Fa). (130) 


Completeness 

We show completeness of (9) by establishing that every state q € Q\ [v4] = [Ful 
is losing for the system player. In view of Problem 1 this requires to show that for 
all q € [p4] and all system strategies ft either (7a) or (7b) does not hold. This is 
formalized in [21] by first negating the fixed-point in (9) and deriving the induced 
ranking of this negated fixed-point. Using this ranking, we first show that the 
environment can (i) render the negated winning set Z~ invariant and (ii) can 
always enforce the play to visit Fg only finitely often, resulting in a violation 
of the guarantees. Using these observations we finally show that whenever (7a) 
holds for an arbitrary system strategy f! starting in [p4], then (7b) cannot hold. 
With this, completeness, as formalized in the following theorem, directly follows. 


Theorem 2. Let (H,F4,Fg) be a GR(1) game with singleton winning condi- 
tions Fa = {F4} and Fg = {Fg}. Then it holds for all q € [p4] and all system 
strategies f! over H that either 


0 £ LalH, f!) C La(H, FA) U La(H, Fg), or (14a) 
pfx(£,(H, f')) = pfx(Lq(A, f+) O Lq(H,F.)) does not hold. (14b) 


? Given a state q E€ Q = Q? U Q! we use the subscript q to denote that the respective 
set of plays is defined by using q as the initial state of H. 
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A Solution for Problem 1 

We note that the additional assumption in Theorem 1 is required only to ensure 
that the resulting strategy fulfills (7b). Suppose that this assumption holds for 
the initial state go of H. That is, consider a GR(1) game (H, F4, Fg) with single- 
ton winning conditions F4 = {F4} and Fg = {Fg} s.t. L(A, Fg) C L(A, Fa). 
Then it follows from Theorem 2 that Problem 1 has a solution iff go € [ya]. 
Furthermore, if qo € [p4], based on the intermediate values maintained for the 
computation of p4 in (10) and the ranking defined in (12), we can construct ft 
that wins the GR(1) condition in (7a) and is non-conflicting, as in (7b). 

We can check symbolically whether L(H, Fg) C £(H, F4). For this we con- 
struct a game graph H’ from H by removing all states in F4, and then check 
whether £(H’, Fg) is empty. The latter is decidable in logarithmic space and 
polynomial time. If this check fails, then L(H, Fg) Z L(H, F4). Furthermore, 
we can replace £(H, Fg) in (7a) by L(H, Fg) O L(A, Fa) without affecting the 
restriction (7a) imposes on the choice of f+. Given singleton winning conditions 
Fg and F4, we see that L(H, Fg) N L(H, Fa) = L(A, {Fo, Fa}) and it triv- 
ially holds that L(H,{Fg, Fa}) C C(HA, F4). That is, we fulfill the conditional 
by replacing the system guarantee L(H, Fg) by L(A, {Fg, Fa}). However, this 
results in a GR(1) synthesis problem with m = 1 and n = 2, which we discuss 
next. 


4 Algorithmic Solution for GR(1) Winning Conditions 


We now consider a general GR(1) game (H, FA, Fg) with Fy = {'Fu,...,"Fa} 
and Fg = {'Fg,...,"Fg} s.t. n,m > 1. The known fixed-point for solving GR(1) 
games in [6] rewrites the three nested fixed-point in (8) in a vectorized version, 
which induces an order on the guarantee sets in Fg and adds a disjunction over 
all assumption sets in F4 to every line of this vectorized fixed-point. Adapting 
the same idea to the 4-nested fixed-point algorithm (9) results in 


1z wily. (Vi v X . u ®W 10) 
27 pY . (Vi v ”X . u ”®W ”R) 


Pa = V , (15) 


ng uY. ( Si v nby . H nby n0) 


where, Q = (Fg N Pre!(**Z)) U Pre! (€Y) U (Q \ Fa N CondPre(W, X \ ¢F4)) 
and a* denotes (a mod n) +1. 

The remainder of this section shows how soundness and completeness carries 
over from the 4-nested fixed-point algorithm (9) to its vectorized version in (15). 


Soundness and Completeness 

We refer to intermediate sets obtained during the computation of the fixpoints by 
similar notations as in Sect. 3. For example, the set "Y+ is the i-th approximation 
of the fixpoint computing “Y and ayi is the j-th approximation of “W while 
computing the i-th approximation of “Y, i.e., computing “°Y* and using °Y*"1. 
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Similar to the above, we define a mode-based rank for every state q € "Z°; we 
track the currently chased guarantee a € [1; n] (similar to [6]) and the currently 
avoided assumption set b € [l,m] as an additional internal mode. In analogy to 
(10) we define 


““ank(q) = (i, j) if q € (Y'\ VE") (“w;} \ awi) for i,j >0. (16) 


Again, we order ranks lexicographically, and, in analogy to (lla), (11b) and 
(11c), we have 


qE°D & “rank(q) = (1,1) > q € “Fg, (17a) 
qe%E* & *rank(q) = (4,1) Ai>1, (17b) 
qeVR, = *ank(q) = (ij) Aj >1 => q¢ Fy. (17c) 


The sets “Y*, “W$, 4D, E’ and “R’ are interpreted in direct analogy to Sect. 3, 
where a and b annotate the used line and conjunct in (15). 

Figure 3 (right) shows a schematic representation of the ranking for an exam- 
ple with “ = 3, all = 0, a2 = 4, ax = 2, ala = 2, al. = 3, a23 = 0, and 
a3] — 2. Again, the set D C °Fg is represented by the diamond at the top of the 
figure. Similarly, all ellipses represent sets “EŻ added in the i-th iteration over 
line a of (15). Again, progress out of ellipses can be enforced by the system, indi- 
cated by the solid arrows leaving those shapes. However, this might not preserve 
the current b mode. It might be the environment choosing which assumption to 
avoid next. Further, the environment might choose to change the b mode along 
with decreasing the i-rank, as indicated by the colored dashed lines®. Finally, 
the interpretation of the sets represented by rectangular shapes in Fig. 3 (right), 
corresponding to (17c), is in direct analogy to the case with singleton winning 
conditions. It should be noticed that this is the only place where we preserve the 
current b-mode when constructing a strategy. 

Using this intuition we define a system strategy that uses enforceable and 
existing transitions to decrease the rank if possible and preserves the current a 
mode until the diamond shape is reached. The b mode is only preserved within 
rectangular sets. This is formalized by a strategy 


Jr Uact:n ((Q* NZ) x a x [1;m]) > Q? x [1;n] x [1;m] (18a) 


s.t. (d, 7) -) = f(a, g ) implies q € ô! (q) and (q', a',b') = f(a, a, b) implies 


q €™Z™ Ad =a", atrank(q) = (1,1) 
a'b rank(q') < (i—1,-) Ad =a, atrank(q) = (i,1),¿>1. (18b) 
a'brank(q') < (i, 1) Ad =a ^b =b, ®rank(q) = (ij), j> 1 


3 The strategy extraction in (18a) and (18b) prevents the system from choosing a 
different b mode. The strategy choice could be optimized w.r.t. fast progress towards 
“Fg in such cases. 
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We say that a play m over H is compliant with f1 if there exist mode traces a € 
[1;n]* and 8 € [1; m]* s.t. for all k € N holds (7(2k + 2), a(2k + 2), 6(2k+2)) = 
fi(a(2k + 1), a(2k +1), B(2k +1)), and (i) a(2k +1) = a(2k)t if “rank(a(2k + 
1)) = (1,1), (ii) a(2k + 1) = a(2k) if *rank(a(2k + 1)) = (i,1),i > 1, and (iii) 
a(2k + 1) = a(2k) and B(2k + 1) = G(2k) if *rank(a(2k + 1)) = (i, j), j > 1. 

With this it is easy to see that the intuition behind Theorem 1 directly carries 
over to every line of (15). Additionally, using Pre (tZ ) in *D allows to cycle 
through all the lines of (15), which ensures that every set “Fg € Fg is tried to 
be attained by the constructed system strategy in a pre-defined order. See [21] 
for a formalization of this intuition and a detailed proof. 

To prove completeness, it is also shown in [21] that the negation of (15) can 
be over-approximated by negating every line separately. Therefore, the reasoning 
for every line of the negated fixed-point carries over from Sect. 3, resulting in the 
analogous completeness result. With this we obtain soundness and completeness 
in direct analogy to Theorems 1-2, formalized in Theorem 3. 


Theorem 3. Let (H, F4, Fg) be a GR(1) game with Fa = {'Fa,...,™F a} and 
Fg = {'Fg,...,"Fco}. Suppose f! is the system strategy in (18a) and (18b) based 
on the ranking in (16). Then it holds for all q € [p3] that (13a), (13b) and (13c) 
hold. Furthermore, it holds for all q ¢ |p}] and all system strategies f over H 
that either (14a) or (14b) does not hold. 


A Solution for Problem 1 

Given that L(H, Fg) C L(H, Fa) it follows from Theorem 3 that Problem 1 has 
a solution iff qo € [p3]. Furthermore, if go € [yi] we can construct f! that wins 
the GR(1) condition in (7a) and is non-conflicting, as in (7b). 

Using a similar construction as in Sect.3, we can symbolically check whether 
L(H,Fg) C L(H, Fa). For this, we construct a new game graph Hp, for every 
PF 4, b € [1;m] by removing the latter set from the state set of H and checking 
whether L(H,, Fg) is empty. If some of these m checks fail, we have L(H, Fg) Z 
L(H, F4). Now observe that by checking every °F, separately, we know which 
goals are not necessarily passed by infinite runs which visit all “Fg infinitely often 
and can collect them in the set F'!'¢¢. Using the same reasoning as in Sect. 3, we 
can simply add the set Uy to the system guarantee set to obtain an equivalent 
synthesis problem which is solvable by the given algorithm, if it is realizable. 
More precisely, consider the new system guarantee set Fg = Fg U aoe and 
observe that L(H, Fg) C L(H, Fa) by definition, and therefore substituting 
L(H, Fg) by L(H,FG) in (7a) does not change the satisfaction of the given 
inclusion. 


5 Complexity Analysis 


We show that the search for a more elaborate strategy does not affect the worst 
case complexity. In Sect. 6 we show that this is also the case in practice. We state 
this complexity formally below. 
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Theorem 4. Let (H,F4,Fg) be a GR(1) game. We can check whether there 
is a winning non-conflicting strategy f! by a symbolic algorithm that performs 
O(|Q|?|Fe||F.a|) next step computations and by an enumerative algorithm that 
works in time O(m|Q|?|Fg||Fa|), where m is the number of transitions of the 
game. 


Proof. Each line of the fixed-point is iterated O(|Q|?) times [8]. As there are 
|Fg||Fa| lines the upper bound follows. As we have to compute |Fg||F_4| different 
ranks for each state, it follows that the complexity is O(m|Q|?|Fc||Fa)). 


We note that enumeratively our approach is theoretically worse than the 
classical approach to GR(1). This follows from the straight forward reduction to 
the rank computation in the rank lifting algorithm and the relative complexity 
of the new rank when compared to the general GR(1) rank. We conjecture that 
more complex approaches, e.g., through a reduction to a parity game and the 
usage of other enumerative algorithms, could eliminate this gap. 


6 Experiments 


We have implemented the 4-nested fixed-point algorithm in (15) and the corre- 
sponding strategy extraction in (18a) and (18b). It is available as an extension to 
the GR(1) synthesis tool slugs [14]. In this section we show how this algorithm 
(called 4FP) performs in comparison to the usual 3-nested fixed-point algorithm 
for GR(1) synthesis (called 3FP) available in slugs. All experiments were run 
on a computer with an Intel i5 processor running an x86 Linux at 2GHz with 
8 GB of memory. 

We first run both algorithms on a benchmark set obtained from the maze 
example in the introduction by changing the number of rows and columns of 
the maze. We first increased the number of lines in the maze and added a goal 
state for both the obstacle and the robot per line. This results in a maze where 
in the first and last column, system and environment goals alternate and all 
adjacent cells are separated by a horizontal wall. Hence, both players need to 
cross the one-cell wide white space in the middle infinitely often to visit all 
their goal states infinitely often. The computation times and the number of 
states in the resulting strategy are shown in Tablel, upper part, column 3- 
6. Interestingly, we see that the 3FP always returns a strategy that blocks the 
environment. In contrast, the non-conflicting strategies computed by the 4FP are 
relatively larger (in state size) and computed about 10 times slower compared 
to the 3FP (compare column 3-4 and 5-6). When increasing the number of 
columns instead (lower part of Table 1), the number of goals is unaffected. We 
made the maze wider and left only a one-cell wide passage in the middle of the 
maze to allow crossings between its upper and lower row. Still, the 3FP only 
returns strategies that falsify the assumption, which have fewer states and are 
computed much faster than the environment respecting strategy returned by the 
4FP. Unfortunately, the speed of computing a strategy or its size is immaterial 
if the winning strategy so computed wins only by falsifying assumptions. 
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To rule out the discrepancy between the two algorithms w.r.t. the size of 
strategies, we slightly modified the above maze benchmark s.t. the environment 
assumptions are not falsifiable anymore. We increased the capabilities of the 
obstacle by allowing it to move at most 2 steps in each round and to “jump 
over” the robot. Under these assumptions we repeated the above experiments. 
The computation times and the number of states in the resulting strategy are 
shown in Table 1, column 9-12. We see, that in this case the size of the strategies 
computed by the two algorithms are more similar. The larger number for the 
4FP is due to the fact that we have to track both the a and the b mode, possibly 
resulting in multiple copies of the same a-mode state. We see that the state 
difference decreases with the number of goals (upper part of Table 1, column 
9-12) and increases with the number of (non-goal) states (lower part of Table 1, 
column 9-12). In both cases, the 3FP still computes faster, but the difference 
decreases with the number of goals. 

In addition to the 3FP and the 4FP we have also tested a sound but incom- 
plete heuristic, which avoids the disjunction over all b’s in every line of (15) 
by only investigating a = b. The state count and computation times for this 
heuristic are shown in Table1, column 7-8 for the original maze benchmark, 
and in column 13-14 for the modified one. We see that in both cases the heuris- 
tic only returns a winning strategy if the maze is not wider then 3 cells. This 
is due to the fact that in all other cases the robot cannot prevent the obstacle 
from attaining a particular assumption state until the robot has moved from one 
goal to the next. The 4FP handles this problem by changing between avoided 
assumptions in between visits to different goals. Intuitively, the computation 
times and state counts for the heuristic should be smaller then for the 4FP, as 
the exploration of the disjunction over b’s is avoided, which is true for many 
scenarios of the considered benchmark. It should however be noted that this is 
not always the case (compare e.g. line 3, column 6 and 8). This stems from the 
fact that restricting the synthesis to avoiding one particular assumption might 
require more iterations over W and Y within the fixed-point computation. 


Table 1. Experimental results for the maze benchmark. The size of the maze is given 
in columns/lines, the number of goals is given per player. The states are counted for the 
returned winning strategies. Strategies preventing the environment from fulfilling its 
goals are indicated by a *. Recorded computation times are rounded wall-clock times. 


falsifiable assumptions non-falsifiable assumptions 
3FP 4FP Heuristic 3FP 4FP Heuristic 
size |goals||states|time|states|time|states|time|]|states|time|states|time|states|time 


3/2] 2 10* |< 1s} 46 |< 1s} 12 |< 1si| 35 |< 1s} 50 |< 1s} 40 |< 1s 
3/10] 10 || 34* |< 1s) 1401] 8s | 1307] 3s |} 1119] 1s |1513 | 13s | 1533 | 5s 
3/20} 20 || 64* | 21s | 5799 |201s| 5732 |337s|| 3926 | 37s | 6000 |163s] 6378 | 105s 
25/2) 2 94* |< 1s| 2144 | 4s | mr. | 6s || 744 |< 1s] 2318] 4s | mr. | 5s 
63/2] 2 397* |< 1s/14259] 32s | n.r. |101s|| 4938 | 2s |15465| 54s | n.r. | 66s 
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7 Discussion 


We believe the requirement that a winning strategy be non-conflicting is a sim- 
ple way to disallow strategies that win by actively preventing the environment 
from satisfying its assumptions, without significantly changing the theoretical 
formulation of reactive synthesis (e.g., by adding different winning conditions or 
new notions of equilibria). It is not a trace property, but our main results show 
that adding this requirement retains the algorithmic niceties of GR(1) synthesis: 
in particular, symbolic algorithms have the same asymptotic complexity. 

However, non-conflictingness makes the implicit assumption of a “maximally 
flexible” environment: it is possible that because of unmodeled aspects of the 
environment strategy, it is not possible for the environment to satisfy its spec- 
ifications in the precise way allowed by a non-conflicting strategy. In the maze 
example discussed in Sect. 1, the environment needs to move the obstacle to pre- 
cisely the goal cell which is currently rendered reachable by the system. If the 
underlying dynamics of the obstacle require it to go back to the lower left from 
state q3 before proceeding to the upper right (e.g., due to a required battery 
recharge), the synthesized robot strategy prevents the obstacle from doing so. 

Finally, if there is no non-conflicting winning strategy, one could look for a 
“minimally violating” strategy. We leave this for future work. Additionally, we 
leave for future work the consideration of non-conflictingness for general LTL 
specifications or (efficient) fragments thereof. 
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Abstract. StocHy is a software tool for the quantitative analysis of 
discrete-time stochastic hybrid systems (SHS). StocHy accepts a high-level 
description of stochastic models and constructs an equivalent SHS model. 
The tool allows to (i) simulate the SHS evolution over a given time 
horizon; and to automatically construct formal abstractions of the sHs. 
Abstractions are then employed for (ii) formal verification or (iii) con- 
trol (policy, strategy) synthesis. StocHy allows for modular modelling, 
and has separate simulation, verification and synthesis engines, which 
are implemented as independent libraries. This allows for libraries to be 
easily used and for extensions to be easily built. The tool is implemented 
in C++ and employs manipulations based on vector calculus, the use of 
sparse matrices, the symbolic construction of probabilistic kernels, and 
multi-threading. Experiments show StocHy’s markedly improved perfor- 
mance when compared to existing abstraction-based approaches: in par- 
ticular, StocHy beats state-of-the-art tools in terms of precision (abstrac- 
tion error) and computational effort, and finally attains scalability to 
large-sized models (12 continuous dimensions). StocHy is available at 
www.gitlab.com/natchi92/StocHy. Data or code related to this paper is 
available at: [31]. 


1 Introduction 


Stochastic hybrid systems (SHS) are a rich mathematical modelling framework 
capable of describing systems with complex dynamics, where uncertainty and 
hybrid (that is, both continuous and discrete) components are relevant. Whilst 
earlier instances of SHS have a long history, SHS proper have been thoroughly 
investigated only from the mid 2000s, and have been most recently applied to the 
study of complex systems, both engineered and natural. Amongst engineering 
case studies, SHS have been used for modelling and analysis of micro grids [29], 
smart buildings [23], avionics [7], automation of medical devices [3]. A benchmark 
for SHS is also described in [10]. However, a wider adoption of SHS in real- 
world applications is stymied by a few factors: (i) the complexity associated 
with modelling sHs; (ii) the generality of their mathematical framework, which 
requires an arsenal of advanced and diverse techniques to analyse them; and (iii) 
the undecidability of verification/synthesis problems over SHS and the curse of 
dimensionality associated with their approximations. 


© The Author(s) 2019 
T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 247-264, 2019. 
https://doi.org/10.1007/978-3-030-17465-1_14 
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This paper introduces a new software tool - StocHy - which is aimed at 
simplifying both the modelling of SHS and their analysis, and which targets 
the wider adoption of SHS, also by non-expert users. With focus on the three 
limiting factors above, StocHy allows to describe SHS by parsing or extending 
well-known and -used state-space models and generates a standard SHS model 
automatically and formats it to be analysed. StocHy can (i) perform verification 
tasks, e.g., compute the probability of staying within a certain region of the state 
space from a given set of initial conditions; (ii) automatically synthesise policies 
(strategies) maximising this probability, and (iii) simulate the sHs evolution over 
time. StocHy is implemented in c++ and modular making it both extendible 
and portable. 


Related work. There exist only a few tools that can handle (classes of) 
sHs. Of much inspiration for this contribution, FAUST? [28] generates abstrac- 
tions for uncountable-state discrete-time stochastic processes, natively support- 
ing SHS models with a single discrete mode and finite actions, and performs 
verification of reachability-like properties and corresponding synthesis of poli- 
cies. FAUST? is naively implemented in MATLAB and lacks in scalability to large 
models. The MODEST TOOLSET [18] allows to model and to analyse classes of 
continuous-time SHS, particularly probabilistic hybrid automata (PHA) that com- 
bine probabilistic discrete transitions with deterministic evolution of the con- 
tinuous variables. The tool for stochastic and dynamically coloured petri nets 
(SDCPN) [13] supports compositional modelling of PHA and focuses on simulation 
via Monte Carlo techniques. The existing tools highlight the need for a new soft- 
ware that allows for (i) straightforward and general SHS modelling construction 
and (ii) scalable automated analysis. 


Contributions. The StocHy tool newly enables 


— formal verification of SHS via either of two abstraction techniques: 

e for discrete-time, continuous-space models with additive disturbances, 
and possibly with multiple discrete modes, we employ formal abstrac- 
tions as general Markov chains or Markov decision processes [28]; StocHy 
improves techniques in the FAUST? tool by simplifying the input model 
description, by employing sparse matrices to manipulate the transition 
probabilities and by reducing the computational time needed to generate 
the abstractions. 

e for models with a finite number of actions, we employ interval Markov 
decision processes and the model checking framework in [22]; StocHy pro- 
vides a novel abstraction algorithm allowing for efficient computation of 
the abstract model, by means of an adaptive and sequential refining of 
the underlying abstraction. We show that we are able to generate sig- 
nificantly smaller abstraction errors and to verify models with up to 12 
continuous variables. 

— control (strategy, policy) synthesis via formal abstractions, employing: 

e stochastic dynamic programming; StocHy exploits the use of symbolic 

kernels. 
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e robust synthesis using interval Markov decision processes; StocHy auto- 
mates the synthesis algorithm with the abstraction procedure and the 
temporal property of interest, and exploits the use of sparse matrices; 

— simulation of complex stochastic processes, such as SHS, by means of Monte 
Carlo techniques; StocHy automatically generates statistics from the simula- 
tions in the form of histograms, visualising the evolution of both the contin- 
uous random variables and the discrete modes. 


This contribution is structured as follows: Sect. 2 crisply presents the theoreti- 
cal underpinnings (modelling and analysis) for the tool. We provide an overview 
of the implementation of StocHy in Sect.3. We highlight features and use of 
StocHy by a set of experimental evaluations in Sect. 4: we provide four differ- 
ent case studies that highlight the applicability, ease of use, and scalability of 
StocHy. Details on executing all the case studies are detailed in this paper and 
within a Wiki page that accompanies the StocHy distribution. 


2 Theory: Models, Abstractions, Simulations 


2.1 Models - Discrete-Time Stochastic Hybrid Systems 
StocHy supports the modelling of the following general class of sus [1,4]. 
Definition 1. A sus [4/ is a discrete-time model defined as the tuple 

H = (0,n,U,T,,T,), where (1) 


- O={H,@,---,4m}, MEN, represents a finite set of modes (locations); 

- n E N is the dimension of the continuous space R” of each mode; the hybrid 
state space is then given by D = Ugeo{g} x R”; 

- U is a continuous set of actions, e.g. R”; 

- T4: Qx DxU — [0,1] is a discrete stochastic kernel on Q given D x U, 
which assigns to each s = (q,x) € D and u € U, a probability distribution 
over Q : T,(-|s,u); 

- T, : B(R”) x D x U = [0,1] is a Borel-measurable stochastic kernel on R” 
given D x U, which assigns to each s E€ D and u € U a probability measure 
on the Borel space (IR", B(R")) : Ta (Js, u). 


In this model the discrete component takes values in a finite set Q of modes 
(a.k.a. locations), each endowed with a continuous domain (the Euclidean space 
R”). As such, a point s over the hybrid state space D is pair (q, x), where q E€ Q 
and « € R”. The semantics of transitions at any point over a discrete time 
domain, are as follows: given a point s € D, the discrete state is chosen from 
T,, and depending on the selected mode q € Q the continuous state is updated 
according to the probabilistic law Ty. Non-determinism in the form of actions 
can affect both discrete and continuous transitions. 
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Remark 1. A rigorous characterisation of SHS can be found in [1], which intro- 
duces a general class of models with probabilistic resets and a hybrid actions 
space. Whilst we can deal with general SHS models, in the case studies of this 
paper we focus on special instances, as described next. 


Remark 2 (Special instance). In Case Study 2 (see Sect. 4.2) we look at models 
where actions are associated to a deterministic selection of locations, namely 
T :U — Q and U is a finite set of actions. 


Remark 3 (Special instance). In Case Study 4 (Sect. 4.4) we consider non-linear 
dynamical models with bilinear terms, which are characterised for any q € Q by 
Ek41 = Age + Bote + £k X; Noiti,k + Gqwe, where k € N represents the 
discrete time index, Aq, By, Gq are appropriately sized matrices, Ng; represents 
the bilinear influence of the i-th input component u;, and wp = w ~ N(-;0,1) 
and N(-;7,v) denotes a Gaussian density function with mean 7 and covariance 
matrix v?. This expresses the continuous kernel T, : B(R") x D x U — [0,1] as 


N(-; Agu + But eY Ngati + Fy, Ga). (2) 
i=l 


In Case Study 1-2-3 (Sects. 4.1-4.3), we look at the special instance from [22], 
where the dynamics are autonomous (no actions) and linear: here Ty is 


N(-; Age + Fy, Gq), (3) 


where in Case Studies 1, 3 Q is a single element. 


Definition 2. A Markov decision process (MDP) [5] is a discrete-time model 
defined as the tuple 
H=(Q,U,Ty), where (4) 


- Q= {q1, 42,- --;,qm}, MEN, represents a finite set of modes; 

- U is a finite set of actions; 

—~T:QxQxU — [0,1] is a discrete stochastic kernel that assigns, to each 
qE Q andu EU, a probability distribution over Q : T,(-|q, u). 


Whenever the set of actions is trivial or a policy is synthesised and used (cf. 
discussion in Sect. 2.2) the MDP reduces to a Markov chain (MC), and a kernel 
Tı: Q x Q — [0,1] assigns to each q € Q a distribution over Q as T,(-|q). 


Definition 3. An interval Markov decision process (IMDP) [26] extends the syn- 
tax of an MDP by allowing for uncertain T4, and is defined as the tuple 
H=(Q,U,P,P), where (5) 


- Q and u are as in Definition 2; 

- Č and P : Q xUx Q — [0,1] is a function that assigns to each q € Q 
a lower (upper) bound probability distribution over Q : P(-\q,u) (P(-\q,u) 
respectively). 
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For all q,q¢ € Q andu €U, it holds that P(q'|q, u) < Ê(q'lq, u) and, 


X Pd'lau <1 < XO Pi'a, u). 


qg'EQ g'EQ 


Note that when P(-\q,u) = P(-\q,u), the IMDP reduces to the MDP with 


2.2 Formal Verification and Strategy Synthesis via Abstractions 


Formal verification and strategy synthesis over SHS are in general not decid- 
able [4,30], and can be tackled via quantitative finite abstractions. These are 
precise approximations that come in two main different flavours: abstractions 
into MDP [4,28] and into IMDP [22]. Once the finite abstractions are obtained, 
and with focus on specifications expressed in (non-nested) PCTL or fragments 
of LTL [5], formal verification or strategy synthesis can be performed via proba- 
bilistic model checking tools, such as PRISM [21], STORM [12], IscASMc [17]. We 
overview next the two alternative abstractions, as implemented in StocHy. 


Abstractions into Markov decision processes. Following [27], MDP are 
generated by either (i) uniformly gridding the state space and computing an 
abstraction error, which depends on the continuity of the underlying continu- 
ous dynamics and on the chosen grid; or (ii) generating the grid adaptively and 
sequentially, by splitting the cells with the largest local abstraction error until a 
desired global abstraction error is achieved. The two approaches display an intu- 
itive trade-off, where the first in general requires more memory but less time, 
whereas the second generates smaller abstractions. Either way, the probability to 
transit from each cell in the grid into any other cell characterises the MDP matrix 
T,. Further details can be found in [28]. StocHy newly provides a C++ imple- 
mentation and employs sparse matrix representation and manipulation, in order 
to attain faster generation of the abstraction and use in formal verification or 
strategy synthesis. 


Verification via MDP (when the action set is trivial) is performed to check the 
abstraction against non-nested, bounded-until specifications in PCTL [5] or co- 
safe linear temporal logic (CSLTL) [20]. 


Strategy synthesis via MDP is defined as follows. Consider, the class of determin- 
istic and memoryless Markov strategies 7 = (po, H1,- . ) where up : Q > U. We 
compute the strategy 7* that maximises the probability of satisfying a formula, 
with algorithms discussed in [28]. 


Abstraction into Interval Markov decision processes (IMDP) is based on a 
procedure in [11] performed using a uniform grid and with a finite set of actions 
U (see Remark 2). StocHy newly provides the option to generate a grid using 
adaptive/sequential refinements (similar to the case in the paragraph above) [27], 
which is performed as follows: (i) define a required minimal maximum abstraction 
error Emag; (ii) generate a coarse abstraction using the Algorithm in [11] and 
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compute the local error €g that is associated to each abstract state q; (iii) split 
all cells where £q > Emax along the main axis of each dimension, and update the 
probability bounds (and errors); and (iv) repeat this process until Yq, €q < Emas- 


Verification via IMDP is run over properties in CSLTL or bounded-LTL 
(BLTL) form using the IMDP model checking algorithm in [22]. 


Synthesis via IMDP [11] is carried out by extending the notions of strategies of 
MDP to depend on memory, that is on prefixes of paths. 


2.3 Analysis via Monte Carlo Simulations 


Monte Carlo techniques generate numerical sampled trajectories representing 
the evaluation of a stochastic process over a predetermined time horizon. Given 
a sufficient number of trajectories, one can approximate the statistical properties 
of the solution process with a required confidence level. This approach has been 
adopted for simulation of different types of SHS. [19] applies sequential Monte 
Carlo simulation to SHS to reason about rare-event probabilities. [13] performs 
Monte Carlo simulations of classes of SHS described as Petri nets. [8] proposes 
a methodology for efficient Monte Carlo simulations of continuous-time SHS. In 
this work, we analyse a SHS model using Monte Carlo simulations following the 
approach in [4]. Additionally, we generate histogram plots at each time step, 
providing further insight on the evolution of the solution process. 


3 Overview of StocHy 


Installation. StocHy is set up using the provided GET_DEP file found within 
the distribution package, which will automatically install all the required depen- 
dencies. The executable RUN.SH builds and runs StocHy. This basic installation 
setup has been successfully tested on machines running Ubuntu 18.04.1 LTS 
GNU and Linux operating systems. 


Input interface. The user interacts with StocHy via the MAIN file and must 
specify (i) a high-level description of the model dynamics and (ii) the task to 
be performed. The description of model dynamics can take the form of a list 
of the transition probabilities between the discrete modes, and of the state- 
space models for the continuous variables in each mode; alternatively, a descrip- 
tion can be obtained by specifying a path to a MATLAB file containing the 
model description in state-space form together with the transition probabil- 
ity matrix. Tasks can be of three kinds (each admitting specific parameters): 
simulation, verification, or synthesis. The general structure of the input inter- 
face is illustrated via an example in Listing 1.1: here the user is interested in 
simulating a SHS with two discrete modes Q = {qo,qi} and two continuous 
variables evolve according to (3). The model is autonomous and has no con- 
trol actions. The relationship between the discrete modes is defined by a fixed 
transition probability (line 1). The evolution of the continuous dynamics are 
defined in lines 2-14. The initial condition for both the discrete modes and 
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arma::mat Tq = { {0.4, 0.6},{0.7,0.3}}; // Transition probabilities 
// Evolution of the continuous variables for each discrete mode 
// First model a a eas 


arma::mat AqO = {{0.5, 0.4},{0.2,0.6}}; l Hana 
arma: :mat Fq0 = { {0}, {0}}; Transition Probabilities fee) 
arma::mat GqO = {{0.4,0},{0.3, 0.3}}; i : Build SHS 
ssmodels_t modelq0(Aq0, Fq0, Gq0); j -yy 
// Second model i [ State space model for mode 1 
arma::mat Aq1 = {{0.6, 0.3},{0.1,0.7}}; ; et 
10 arma::mat Fqi = { {0}, {0}}; 

cA = . : Tnput 
arma: :mat Gq1 7 {{0 2 2 0} ? {0 v1 ? O}} t : ( State space model for mode n generated 
ssmodels_t modelqí(Aq1,Fq1, Gq1); i 
std::vector<ssmodels_t> models = K an 
{modelq1 ,modelq2}; User Task Specification S 


// Initial state q_0 de eea en reee aeae Eere asee 
arma: :mat q_init = arma: :zeros<arma: :mat>(1,1); 
// Initial continuous variables 
arma: :mat xi_init = arma: :ones<arma:mat>(2,1); 
exdata_t data(x1_init,q_init); 
// Build shs 
shs_t<arma::mat,int> mySHS(Tq,models,data) ; 
22 // Time horizon 


int K = 32; 

// Task definition (1 = simulator, 2 = faust*2, 3 = imdp) 
int lb = 1; 

taskSpec_t mySpec(1b,K); 

// Combine 


28 inputSpec_t<arma: :mat,int> myInput (mySHS ,mySpec) ; 
// Perform task 
performTask(myInput) ; 


Listing 1.1: Description of MAIN file for simulating a SHS consisting of two discrete 
modes and two continuous variables evolving according to (2). 


the continuous variables are set in lines 16-21 (this is needed for simulation 
tasks). The equivalent SHS model is then set up by instantiating an object of 
type shs_t<arma: :mat,int> (line 23). Next, the task is defined in line 27 (sim- 
ulation with a time horizon K = 32, as specified in line 25 and using the sim- 
ulator library, as set in line 26). We combine the model and task specification 
together in line 29. Finally, StocHy carries out the simulation using the function 
performTask (line 31). 


Modularity. StocHy comprises independent libraries for different tasks, namely 
(i) FAUST?, (ii) IMDP, and (iii) simulator. Each of the libraries is separate and 
depends only on the model structure that has been entered. This allows for 
seamless extensions of individual sub-modules with new or existing tools and 
methods. The function performTask acts as multiplexer for calling any of the 
libraries depending on the input model and task specification. 


Data structures. StocHy makes use of multiple techniques to minimise com- 
putational overhead. It employs vector algebra for efficient handling of linear 
operations, and whenever possible it stores and manipulates matrices as sparse 
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structures. It uses the linear algebra library Armadillo [24,25], which applies 
multi-threading and a sophisticated expression evaluator that has been shown 
to speed up matrix manipulations in C++ when compared to other libraries. 
FAUST? based abstractions define the underlying kernel functions symbolically 
using the library GiNaC [6], for easy evaluation of the stochastic kernels. 


Output interface. We provide outputs as text files for all three libraries, which 
are stored within the RESULTS folder. We also provide additional PYTHON scripts 
for generating plots as needed. For abstractions based on FAUST?, the user has 
the additional option to export the generated MDP or MC to PRISM format, 
to interface with the popular model checker [21] (StocHy prompts the user this 
option following the completion of the verification or synthesis task). As a future 
extension, we plan to export the generated abstraction models to the model 
checker STORM [12] and to the modelling format JANI [9]. 


4 StocHy: Experimental Evaluation 


We apply StocHy on four different case studies highlighting different models and 
tasks to be performed. All the experiments are run on a standard laptop, with 
an Intel Core i7-8550U CPU at 1.80GHz x 8 and with 8GB of RAM. 


4.1 Case Study 1 - Formal Verification 


We consider the SHS model first presented in [2]. The model takes the form of (1), 
and has one discrete mode and two continuous variables representing the level 
of COs (xı) and the ambient temperature (x2), respectively. The continuous 
variables evolve according to 


A 

Li k+1 = 1k + yi Pm@1,k + Oc(Cout — £1,k)) + O1Wk, (6) 
A Oc 

T2 k+1 = T2k T G, mOr aT set — L2,k) ate p (Tout — T2 k)) ale O2Wk, 


where A the sampling time [min], V is the volume of the zone [m°], pm is 
the mass air flow pumped inside the room [m?/min], oc is the natural drift air 
flow [m/min], Cout is the outside COs level [ppm/min], Tset is the desired 
temperature [°C], Tout is the outside temperature | °C'/min], C, is the zone 
capacitance [.Jm?/ °C], Cpa is the specific heat capacity of air [J/ °C], R is the 
resistance to heat transfer | °C/J], and gç.) is a variance term associated to the 
noise wg ~ N(0, 1). 

We are interested in verifying whether the continuous variables remain within 
the safe set Xsafe = [405,540] x [18,24] over 45 min (K = 3). This property can 
be encoded as a BLTL property, yı := OS* Xsafe, Where [O is the “always” 
temporal operator considered over a finite horizon. The semantics of BLTL is 
defined over finite traces, denoted by ¢ = {¢; a A trace Ç satisfies pı if 
Vj < K,¢; E€ Xsafe, and we quantify the probability that traces generated by 
the SHS satisfy y1. 
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Case study 1: Listings explaining task specification for (a) FAUST? and (b) IMDP 


// Dynamics definition 
2 shs_t<arma: :mat,int> 
myShs(’../CS1i.mat’); 
// Specification for FAUST*2 
// safe set 
arma::mat safe = 
{{405 540}, {18 , 24}}; 
6 // max error 
7 double eps = 1; 
8 // grid type 
// (1 = uniform, 2 = adaptive) 
10 int gridType = 1; 
11 // time horizon 
12 int K = 3; 
13 // task and property type 
14 // (1 = verify safety , 2 = 
verify reach-avoid, 
15 // 3 = safety synthesis, 4 = 
reach-avoid synthesis) 
16 int p = 1; 
17 // library (1 = simulator, 2 = 
faust^2, 3 = imdp) 
18 int 1b = 2; 
19 // task specification 
20 taskSpec_t 


// Dynamics definition 

shs_t<arma: :mat,int> 
myShs(’../CS1.mat’); 

// Specification for IMDP 

// safe set 

arma: :mat safe 
{{405 ,540}, {18,24}}; 

// grid size for each dimension 

arma::mat grid = 
{{0.0845 ,0.0845}}; 

// relative tolerance 

arma::mat reft = {{1,1}}; 

// time horizon 

int K = 3; 

// task and property type 

// (1 = verify safety , 2 = 
verify reach-avoid, 

// 3 = safety synthesis, 4 = 
reach-avoid synthesis) 

int p = 1; 

// library (1 = simulator, 2 = 
faust^2, 3 = imdp) 


int lb = 3; 
// task specification 
taskSpec_t 


mySpec(1b,K,p,safe,eps,gridType); mySpec(1b,K,p,safe,grid,reft) ; 


Listing 1.2: (a) FAUST? 


Listing 1.3: (b) IMDP 


When tackled with the method based on FAUST? that hinges on the compu- 
tation of Lipschitz constants, this verification task is numerically tricky, in view 
of difference in dimensionality of the range of xı and x2 within the safe set Xa fe 
and the variance associated with each dimension Gy, = [9 3] = [4°98 9.211]. 
In order to mitigate this, StocHy automatically rescales the state space so all 


the dynamics evolve in a comparable range. 


Implementation. StocHy provides two verification methods, one based on 
FAUST? and the second based on IMDP. We parse the model from file C$1.MAT 
(see line 2 of Listings 1.2(a) and 1.3(b), corresponding to the two methods). 
CS1.MAT sets parameter values to (6) and uses a A = 15 [min]. As anticipated, 
we employ both techniques over the same model description: 


~ for FAUST? we specify the safe set (Xsafe), the maximum allowable error, the 
grid type (whether uniform or adaptive grid), the time horizon, together with 
the type of property of interest (safety or reach-avoid). This is carried out in 
lines 5-21 in Listing 1.2(a). 
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Table 1. Case study 1: Com- 


parison of verification results for 

pı when using FAUST? vs IMDP. 0:45 
sor 0.4 

Tool Impl. |Q| Time Error 482 0.35 

Method Platform | [states] [s] Emax 0.3 

FAUST? MATLAB 576 186.746 462 0.25 

FAUST? C 576 51.420 f 

IMDP c 576 87.430 0.236 0.2 

FAUST? MATLAB |1089 629.037 443 0.15 

FAUST? © 1089 78.140 

MDP žě C 1089 387.940 0.174 424 oe 

FAUST? MATLAB 2304 2633.155 0.05 

FAUST? C 2304 165.811 1 

IMDP c 2304 1552.950 0.121 A ao 20 k -33 os ga 

FAUST? MATLAB 3481 7523.771 

FAUST? © SABI -946.294 Fig. 1. Case study 1: Lower bound prob- 

IMDP c 3481 3623.090 0.098 mas 2 : s 

FAUST? MATLAB 4225 10022.850 0.900 ability of satisfying Pı generated using 

FAUST? © 4225 3313.990 0.900 IMDP with 3481 states. 

IMDP fe; 4225 4854.580 0.089 


— for the IMDP method, we define the safe set (Xsafe), the grid size, the rela- 
tive tolerance, the time horizon and the property type. This can be done by 
defining the task specification using lines 5-21 in Listing 1.3(b). 


Finally, to run either of the methods on the defined input model, we com- 
bine the model and the task specification using inputSpec_t<arma: :mat, int> 
myInput (myShs ,mySpec), then run the command performTask(myInput). The 
verification results for both methods are stored in the RESULTS directory: 


— for FAUST?, StocHy generates four text files within the RESULTS folder: 
REPRESENTATIVE_POINTS.TXT contains the partitioned state space; TRAN- 
SITION_MATRIX.TXT consists of the transition probabilities of the generated 
abstract MC; PROBLEM_SOLUTION.TXT contains the sat probability for each 
state of the MC; and E.TXT stores the global maximum abstraction error. 

— for IMDP, StocHy generates three text files in the same folder: STEPSMIN.TXT 
stores P of the abstract IMDP; STEPSMAX.TXT stores P; and SOLUTION.TXT 
contains the sat probability and the errors eq for each abstract state q. 


Outcomes. We perform the verification task using both FAUST? and IMDP, 
over different sizes of the abstraction grid. We employ uniform gridding for 
both methods. We further compare the outcomes of StocHy against those of 
the FAUST? tool, which is implemented in MATLAB [28]. Note that the IMDP con- 
sists of |Q|+ 1 states, where the additional state is the sink state qu = D\Xsafe- 
The results are shown in Table 1. We saturate (conservative) errors output that 
are greater than 1 to this value. We show the probability of satisfying the 
formula obtained from IMDP for a grid size of 3481 states in Fig. 1 — similar 
probabilities are obtained for the remaining grid sizes. As evident from Table 1, 
the new IMDP method outperforms the approach using FAUST? in terms of the 
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Fig. 2. Case study 2: (a) Gridded domain together with a superimposed simulation of 
trajectory initialised at (—0.5,—1) within qo, under the synthesised optimal switching 
strategy 7*. Lower probabilities of satisfying y2 for mode qo (b) and for mode qi (c), 
as computed by StocHy. 


maximum error associated to the abstraction (FAUST? generates an abstraction 
error < 1 only with 4225 states). Comparing the FAUST? within StocHy and the 
original FAUST? implementation (running in MATLAB), StocHy offers computa- 
tional speed-up for the same grid size. This is due to the faster computation 
of the transition probabilities, through StocHy’s use of matrix manipulations. 
FAUST? within StocHy also simplifies the input of the dynamical model descrip- 
tion: in the original FAUST? implementation, the user is asked to manually input 
the stochastic kernel in the form of symbolic equations in a MATLAB script. 
This is not required when using StocHy, automatically generates the underlying 
symbolic kernels from the input state-space model descriptions. 


4.2 Case Study 2 - Strategy Synthesis 


We consider a stochastic process with two modes Q = {qo,q1}, which continu- 
ously evolves according to (3) with 


0.43 0.52 10.1 0.65 0.12 0.2 0 0 
Ac. = = Ag. = = Fo. = 
A fee aa Gao E a as es 7A Cn | 0 A aae A i 


and i € {0,1}. Consider the continuous domain shown in Fig.2a over both 
discrete locations. We plan to synthesise the optimal switching strategy 7* 
that maximises the probability of reaching the green region, whilst avoid- 
ing the purple one, over an unbounded time horizon, given any initial condi- 
tion within the domain. This can be expressed with the LTL formula, y2 := 
(apurple) U green, where U is the “until” temporal operator, and the atomic 
propositions {purple, green} denote regions within the set X = [—1.5, 1.5]? (see 
Fig. 2a). 


Implementation. We define the model dynamics following lines 3-14 in List- 
ing 1.1, while we use Listing 1.3 to specify the synthesis task and together 
with its associated parameters. The LTL property y2 is over an unbounded 
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time horizon, which leads to employing the IMDP method for synthesis (recall 
that the FAUST? implementation can only handle time-bounded properties, and 
its abstraction error monotonically increases with the time horizon of the for- 
mula). In order to encode the task we set the variable safe to correspond to 
X the grid size to 0.12 and the relative tolerance to 0.06 along both dimen- 
sions (cf. lines 5-10 in Listing 1.3). We set the time horizon K = -1 to represent 
an unbounded time horizon, let p = 4 to trigger the synthesis engine over the 
given specification and make 1b = 3 to use IMDP method (cf. lines 12-19 in 
Listing 1.3). This task specification partitions the set X into the underlying 
IMDP via uniform gridding. Alternatively, the user has the option to make use 
of the adaptive-sequential algorithm by defining a new variable eps_max which 
characterise the maximum allowable abstraction error and then specify the task 
using taskSpec_t mySpec(1b,K,p, boundary ,eps_max,grid,rtol);. Next, we 
define two files (PHI1.TXT and PHI2.TXT) containing the coordinates within the 
gridded domain (see Fig. 2a) associated with the atomic propositions purple and 
green, respectively. This allows for automatic labelling of the state-space over 
which synthesis is to be performed. Running the main file, StocHy generates a 
SOLUTION.TXT file within the RESULTS folder. This contains the synthesised 7* 
policy, the lower bound for the probabilities of satisfying %2, and the local errors 
Eq for any region q. 


Outcomes. The case study generates an abstraction with a total of 2410 states, 
a maximum probability of 1, a maximum abstraction error of 0.21, and it requires 
a total time of 1639.3 [s]. In this case, we witness a slightly larger abstraction 
error via the IMDP method then in the previous case study. This is due the non- 
diagonal covariance matrix Gg, which introduces a rotation in X within mode 
qo. When labelling the states associated with the regions purple and green, 
an additional error is introduced due to the over- and under-approximation of 
states associated with each of the two regions. We further show the simulation 
of a trajectory under x* with a starting point of (—0.5,—1) in qo, within Fig. 2a. 


4.3 Case Study 3 - Scaling in Continuous Dimension of Model 


We now focus on the continuous dynamics by considering a stochastic process 
with Q = {qo} (single mode) and dynamics evolving according to (3), charac- 
terised by Ag, = 0.8Ia, Fy, = Oa and Gy, = 0.214, where d corresponds to the 
number of continuous variables. We are interested in checking the LTL specifica- 
tion y3 := OX ea fe, where Xsafe = [—1, 1]¢, as the continuous dimension d of the 
model varies. Here “O” is the “always” temporal operator and a trace ¢ satisfies 
v3 if Vk > 0, Çk E Xsafe- In view of the focus on scalability for this Case Study 
3, we disregard discussing the computed probabilities, which we instead covered 
in Sect. 4.1. 


Implementation. Similar to Case Study 2, we follow lines 3-14 in Listing 1.1 
to define the model dynamics, while we use Listing 1.3 to specify the verification 
task using the IMDP method. For this example, we employ a uniform grid having 
a grid size of 1 and relative tolerance of 1 for each dimension (cf. lines 5-10 in 
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Table 2. Case study 3: Verification results of the IMDP-based approach over 3, for 
varying dimension d of the stochastic process. 


Dimensions 2 3 4 5 6 7 8 9 10 TA 12 

[a] 

|Q| [states] 4 14 30 62 126 254 510 1022 2046 4094 8190 

Time taken 0.004 0.06 0.21 0.90 4.16 19.08 79.63 319.25 |1601.31|5705.47 | 21134.23 

[s] 

Error 4.15e-5 |3.34e-5 |2.28e-5 |9.70e-5 |8.81e-6 |1.10e-6 | 2.95e-6 | 4.50e-7|1.06e-7 |4.90e-8 |4.89e-8 
(Emax ) 

Listing 1.3). We set K = -1 to represent an unbounded time horizon, p = 1 to 


perform verification over a safety property and 1b = 3 to use the IMDP method 
(cf. lines 12-19 in Listing 1.3). In Table 2 we list the number of states required for 
each dimension, the total computational time, and the maximum error associated 
with each abstraction. 


Outcomes. From Table2 we can deduce that by employing the IMDP method 
within StocHy, the generated abstract models have manageable state spaces, 
thanks to the tight error bounds that is obtained. Notice that since the number of 
cells per dimension is increased with the dimension d of the model, the associated 
abstraction error Emag is decreased. The small error is also due to the underlying 
contractive dynamics of the process. This is a key fact leading to scalability over 
the continuous dimension d of the model: StocHy displays a significant improve- 
ment in scalability over the state of the art [28] and allows abstracting stochastic 
models with relevant dimensionality. Furthermore, StocHy is capable to handle 
specifications over infinite horizons (such as the considered until formula). 


4.4 Case Study 4 - Simulations 


For this last case study, we refer to the CO2 model described in Case Study 
1 (Sect. 4.1). We extend the CO2 model to capture (i) the effect of occupants 
leaving or entering the zone within a time step (ii) the opening or closing of 
the windows in the zone [2]. pm is now a control input and is an exogenous 
signal. This can be described as a SHS comprising two-dimensional dynamics, over 
discrete modes in the set {qo = (E,C),q. = (F,C),@ = (F,O),q3 = (E,O)} 
describing possible configurations of the room (empty (E) or full (F), and with 
windows open (O) or closed (C)). A MC representing the discrete modes and 
their dynamics is in Fig. 3a. The continuous variables evolve according to Eq. (6), 
which now captures the effect of switching between discrete modes, as 


A 
Ti ,k+1 = T1,k T —(—PmX1,k + 00,c(Cout = L1,k)) + LrCoce,k + O1Wk, (7) 
V 
A Qo,c 
T2 k+1 = T2,k T g PmCpalTset T2 k) } R (Tout T2 k)) 1rTocc,k F O2Wk, 
z 


where the additional terms are: g.) is the natural drift air flow that changes 
depending whether the window is open (go) or closed (oe) [m?/min]; Coce is the 
generated COz level when the zone is occupied (it is multiplied by the indicator 
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// Number of simulations 
int monte = 5000; 
// Initial continuous variables 
arma::mat x_init = 

arma: :zeros<arma: :mat>(2,monte) ; 
// Initialise random generators 
std::random_device rand_dev; 
std: :mt19937 generator (rand_dev()); 
// Define distributions 
std: :normal_distribution<double> 


di{450, 25}; 
06 06 o std: :normal_distribution<double> d2{17,2}; 
for(size_t i = 0; i < monte; ++i) 


12 { 
(a) 13 x_init(0,i) = d1(generator); 
x_init(1,i) d2(generator) ; 


0.8 L 15 } 
// Initial discrete mode q_0 = (E,C) 
arma: :mat q_init = 

arma: :zeros<arma: :mat>(1,monte) ; 
// Definition of control signal 
1 // Read from .txt/.mat file or define here 
(b) 20 arma: :mat u =readInputSignal("../u.txt"); 
//Combining 
exdata_t data(x_init,u,q_init); 


Control Signal (Pm) 


0 5 10 15 20 235 30 
Time steps 


Fig. 3. Case study 4: (a) MC `° 
for the discrete modes of the ~“ 
CO2 model and (b) input Listing 1.4: Case study 4: Definition of intial 
control signal. conditions for simulation 


function 17) [ppm/min]; Toce is the generated heat due to occupants | °C /min], 
which couples the dynamics in (7) as Tocc,k = VZ1,k +h. 


Implementation. The provided file cS4.MAT sets the values of the parameters 
in (7) and contains the transition probability matrix representing the relation- 
ships between discrete modes. We select a sampling time A = 15 [min] and sim- 
ulate the evolution of this dynamical model over a fixed time horizon K = 8h 
(i.e. 32 steps) with an initial CO2 level zı ~ M (450,25) [ppm] and a temperature 
level of x2 ~ N(17,2) [°C]. We define the initial conditions using Listing 1.4. 
Line 2 defines the number of Monte Carlo simulations using by the variable 
monte and sets this to 5000. We instantiate the initial values of the continuous 
variables using the term x_init, while we set the initial discrete mode using the 
variable q_init. This is done using lines 4-17 which defines independent. nor- 
mal distribution for each of the continuous variable from which we sample 5000 
points for each of the continuous variables and defines the initial discrete mode 
to qo = (E, C). We define the control signal pm in line 20, by parsing the u.txt 
which contains discrete values of pm for each time step (see Fig. 3b). Once the 
model is defined, we follow Listing 1.1 to perform the simulation. The simulation 


400 


XL 


380 


A 1 f 
0 5 10 15 20 25 30 
Time steps 


0 5 10 i5 20 25 30 o 5 i0 i5 2 25 30 


Time steps Time steps 


(a) (b) (c) 
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50 on o 
o EO is 10 
o- 20 Bo) 20 
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Fig. 4. Case study 4: Simulation single traces for continuous variables (a) x1, (b) x2 
and discrete modes (c) q. Histogram plots with respect to time step for (d) «1, (e) x2 
and discrete modes (f) q. 


engine also generates a PYTHON script, simPlots.py, which gives the option to 
visualise the simulation outcomes offline. 


Outcomes. The generated simulation plots are shown in Fig.4, which depicts: 
(i) a sample trace for each continuous variable (the evolution of xı is shown in 
Fig. 4a, x2 in Fig. 4b) and for the discrete modes (see Fig. 4c); and (ii) histograms 
depicting the range of values the continuous variables can be in during each 
time step and the associated count (see Fig. 4c for xı and Fig. 4e for z2); and a 
histogram showing the likelihood of being in a discrete mode within each time 
step (see Fig. 4f). The total time taken to generate the simulations is 48.6 [s]. 


5 Conclusions and Extensions 


We have presented StocHy, a new software tool for the quantitative analysis of 
stochastic hybrid systems. There is a plethora of enticing extensions that we are 
planning to explore. In the short term, we intend to: (i) interface with other 
model checking tools such as STORM [12] and the MODEST TOOLSET [16]; (ii) 
embed algorithms for policy refinement, so we can generate policies for mod- 
els having numerous continuous input variables [15]; (iii) benchmarking the 
tool against a set of SHS models [10]. In the longer term, we plan to extend 
StocHy such that (i) it can employ a graphical user-interface; (ii) it can allow 
analysis of continuous-time SHS; and (iii) it can make use of data structures such 
as multi-terminal binary decision diagrams [14] to reduce the memory require- 
ments during the construction of the abstract MDP or IMDP. 


Acknowledgements. The author’s would also like to thank Kurt Degiorgio, Sadegh 
Soudjani, Sofie Haesaert, Luca Laurenti, Morteza Lahijanian, Gareth Molyneux and 
Viraj Brian Wijesuriya. This work is in part funded by the Alan Turing Institute, 
London, and by Malta’s ENDEAVOUR Scholarships Scheme. 


262 


N. Cauchi and A. Abate 


References 


1. 


10. 


11. 


12. 


13. 


14. 


15. 


Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and 
safety for controlled discrete time stochastic hybrid systems. Automatica 44(11), 
2724-2734 (2008) 


. Abate, A.: Formal verification of complex systems: model-based and data-driven 


methods. In: Proceedings of the 15th ACM-IEEE International Conference on For- 
mal Methods and Models for System Design, MEMOCODE 2017, Vienna, Austria, 
29 September—02 October 2017, pp. 91-93 (2017) 


. Abate, A., et al.: ARCH-COMP18 category report: stochastic modelling. EPiC 


Ser. Comput. 54, 71-103 (2018) 


. Abate, A., Katoen, J.P., Lygeros, J., Prandini, M.: Approximate model checking 


of stochastic hybrid systems. Eur. J. Control 16(6), 624-641 (2010) 


. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge 


(2008) 


. Bauer, C., Frink, A., Kreckel, R.: Introduction to the GiNaC framework for sym- 


bolic computation within the C++ programming language. J. Symbolic Comput. 
33(1), 1-12 (2002) 


. Blom, H., Lygeros, J. (eds.): Stochastic Hybrid Systems: Theory and Safety Critical 


Applications. LNCIS, vol. 337. Springer, Heidelberg (2006). https://doi.org/10. 
1007/11587392 


. Bouissou, M., Elmqvist, H., Otter, M., Benveniste, A.: Efficient Monte Carlo simu- 


lation of stochastic hybrid systems. In: Proceedings of the 10th International Mod- 
elica Conference, Lund, Sweden, 10-12 March 2014, no. 96, pp. 715-725. Linköping 
University Electronic Press (2014) 


. Budde, C.E., Dehnert, C., Hahn, E.M., Hartmanns, A., Junges, S., Turrini, A.: 


JANI: quantitative model and tool interaction. In: Legay, A., Margaria, T. (eds.) 
TACAS 2017. LNCS, vol. 10206, pp. 151-168. Springer, Heidelberg (2017). https:// 
doi.org/10.1007/978-3-662-54580-5_9 

Cauchi, N., Abate, A.: Benchmarks for cyber-physical systems: a modular model 
library for building automation systems. IFAC-PapersOnLine 51(16), 49-54 
(2018). 6th IFAC Conference on Analysis and Design of Hybrid Systems ADHS 
2018 

Cauchi, N., Laurenti, L., Lahijanian, M., Abate, A., Kwiatkowska, M., Cardelli, 
L.: Efficiency through uncertainty: scalable formal synthesis for stochastic hybrid 
systems. In: 22nd ACM International Conference on Hybrid Systems: Computation 
and Control (HSCC) (2019). arXiv:1901.01576 

Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A storm is coming: a modern prob- 
abilistic model checker. In: Majumdar, R., Kunéak, V. (eds.) CAV 2017. LNCS, 
vol. 10427, pp. 592-600. Springer, Cham (2017). https: //doi.org/10.1007/978-3- 
319-63390-9_31 

Everdij, M.H., Blom, H.A.: Hybrid Petri Nets with diffusion that have into- 
mappings with generalised stochastic hybrid processes. In: Blom, H.A.P., Lygeros, 
J. (eds.) Stochastic Hybrid Systems. LNCIS, vol. 337, pp. 31-63. Springer, Heidel- 
berg (2006). https: //doi.org/10.1007/11587392_2 

Fujita, M., McGeer, P.C., Yang, J.Y.: Multi-terminal binary decision diagrams: 
an efficient data structure for matrix representation. Formal Methods Syst. Des. 
10(2-3), 149-169 (1997) 

Haesaert, S., Cauchi, N., Abate, A.: Certified policy synthesis for general Markov 
decision processes: an application in building automation systems. Perform. Eval. 
117, 75-103 (2017) 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


StocHy 263 


Hahn, E.M., Hartmanns, A., Hermanns, H., Katoen, J.P.: A compositional mod- 
elling and analysis framework for stochastic hybrid systems. Formal Methods Syst. 
Des. 43(2), 191-232 (2013) 

Hahn, E.M., Li, Y., Schewe, S., Turrini, A., Zhang, L.: iscasMc: a web-based prob- 
abilistic model checker. In: Jones, C., Pihlajasaari, P., Sun, J. (eds.) FM 2014. 
LNCS, vol. 8442, pp. 312-317. Springer, Cham (2014). https://doi.org/10.1007/ 
978-3-319-06410-9_22 

Hartmanns, A., Hermanns, H.: The modest toolset: an integrated environment 
for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) 
TACAS 2014. LNCS, vol. 8413, pp. 593-598. Springer, Heidelberg (2014). https:// 
doi.org/10.1007/978-3-642-54862-8_51 

Krystul, J., Blom, H.A.: Sequential Monte Carlo simulation of rare event proba- 
bility in stochastic hybrid systems. IFAC Proc. Volumes 38(1), 176-181 (2005) 
Kupferman, O., Vardi, M.Y.: Model checking of safety properties. Formal Methods 
Syst. Des. 19(3), 291-314 (2001) 

Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic 
real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, 
vol. 6806, pp. 585-591. Springer, Heidelberg (2011). https://doi.org/10.1007/978- 
3-642-22110- 1-47 

Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for 
discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031-2045 
(2015) 

Larsen, K.G., Mikucionis, M., Muniz, M., Srba, J., Taankvist, J.H.: Online and 
compositional learning of controllers with application to floor heating. In: Chechik, 
M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 244-259. Springer, 
Heidelberg (2016). https: //doi.org/10.1007/978-3-662-49674-9_14 

Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear 
algebra. J. Open Source Softw. 1, 26-32 (2016) 

Sanderson, C., Curtin, R.: A user-friendly hybrid sparse matrix class in C++. In: 
Davenport, J.H., Kauers, M., Labahn, G., Urban, J. (eds.) ICMS 2018. LNCS, vol. 
10931, pp. 422-430. Springer, Cham (2018). https://doi.org/10.1007/978-3-319- 
96418-8_50 

Škulj, D.: Discrete time Markov chains with interval probabilities. Int. J. Approx. 
Reason. 50(8), 1314-1329 (2009) 

Soudjani, S.E.Z.: Formal abstractions for automated verification and synthesis of 
stochastic systems. Ph.D. thesis, TU Delft (2014) 

Soudjani, S.E.Z., Gevaerts, C., Abate, A.: FAUST?: formal abstractions of 
uncountable-STate STochastic processes. In: Baier, C., Tinelli, C. (eds.) TACAS 
2015. LNCS, vol. 9035, pp. 272-286. Springer, Heidelberg (2015). https://doi.org/ 
10.1007 /978-3-662-46681-0_23 

Střelec, M., Macek, K., Abate, A.: Modeling and simulation of a microgrid as a 
stochastic hybrid system. In: 2012 3rd IEEE PES Innovative Smart Grid Technolo- 
gies Europe (ISGT Europe), pp. 1-9, October 2012 

Summers, S., Lygeros, J.: Verification of discrete time stochastic hybrid systems: 
a stochastic reach-avoid decision problem. Automatica 46(12), 1951-1961 (2010) 
Cauchi, N., Abate, A.: Artifact and instructions to generate experimental results 
for TACAS 2019 paper: StocHy: automated verification and synthesis of stochas- 
tic processes (artifact). Figshare (2019). https://doi.org/10.6084/m49.figshare. 
7819487.v1 


264 N. Cauchi and A. Abate 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


S 


Check for 
updates 


Synthesis of Symbolic Controllers: 
A Parallelized and Sparsity-Aware 
Approach 


Mahmoud Khaled!(®), Eric S. Kim?, Murat Arcak?, and Majid Zamani*4 


1 Department of Electrical and Computer Engineering, 
Technical University of Munich, Munich, Germany 
khaled.mahmoud@tum.de 
2 Department of Electrical Engineering and Computer Sciences, 

University of California Berkeley, Berkeley, CA, USA 

{eskim, arcak}@berkeley.edu 
3 Department of Computer Science, University of Colorado Boulder, Boulder, USA 
majid.zamani@colorado.edu 
* Department of Computer Science, Ludwig Maximilian University of Munich, 
Munich, Germany 


Abstract. The correctness of control software in many safety-critical 
applications such as autonomous vehicles is very crucial. One approach to 
achieve this goal is through “symbolic control”, where complex physical 
systems are approximated by finite-state abstractions. Then, using those 
abstractions, provably-correct digital controllers are algorithmically syn- 
thesized for concrete systems, satisfying some complex high-level require- 
ments. Unfortunately, the complexity of constructing such abstractions 
and synthesizing their controllers grows exponentially in the number of 
state variables in the system. This limits its applicability to simple phys- 
ical systems. 

This paper presents a unified approach that utilizes sparsity of the 
interconnection structure in dynamical systems for both construction 
of finite abstractions and synthesis of symbolic controllers. In addition, 
parallel algorithms are proposed to target high-performance comput- 
ing (HPC) platforms and Cloud-computing services. The results show 
remarkable reductions in computation times. In particular, we demon- 
strate the effectiveness of the proposed approach on a 7-dimensional 
model of a BMW 320i car by designing a controller to keep the car 
in the travel lane unless it is blocked. 


1 Introduction 


Recently, the world has witnessed many emerging safety-critical applications 
such as smart buildings, autonomous vehicles and smart grids. These applica- 
tions are examples of cyber-physical systems (CPS). In CPS, embedded control 
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software plays a significant role by monitoring and controlling several physical 
variables, such as pressure or velocity, through multiple sensors and actuators, 
and communicates with other systems or with supporting computing servers. 
A novel approach to design provably correct embedded control software in an 
automated fashion, is via formal method techniques [10,11], and in particular 
symbolic control. 

Symbolic control provides algorithmically provably-correct controllers based 
on the dynamics of physical systems and some given high-level requirements. 
In symbolic control, physical systems are approximated by finite abstractions 
and then discrete (a.k.a. symbolic) controllers are automatically synthesized for 
those abstractions, using automata-theoretic techniques [5]. Finally, those con- 
trollers will be refined to hybrid ones applicable to the original physical systems. 
Unlike traditional design-then-test workflows, merging design phases with for- 
mal verification ensures that controllers are certified-by-construction. Current 
implementations of symbolic control, unfortunately, take a monolithic view of 
systems, where the entire system is modeled, abstracted, and a controller is syn- 
thesized from the overall state sets. This view interacts poorly with the symbolic 
approach, whose complexity grows exponentially in the number of state variables 
in the model. Consequently, the technique is limited to small dynamical systems. 


1.1 Related Work 


Recently, two promising techniques were proposed for mitigating the computa- 
tional complexity of symbolic controller synthesis. The first technique [2] utilizes 
sparsity of internal interconnection of dynamical systems to efficiently construct 
their finite abstractions. It is only presented for constructing abstractions while 
controller synthesis is still performed monolithically without taking into account 
the sparse structure. The second technique [4] provides parallel algorithms tar- 
geting high performance (HPC) computing platforms, but suffers from state- 
explosion problem when the number of parallel processing elements (PE) is fixed. 
We briefly discuss each of those techniques and propose an approach that effi- 
ciently utilizes both of them. 

Many abstraction techniques implemented in existing tools, including SCOTS 
[9], traverse the state space in a brute force way and suffer from an exponen- 
tial runtime with respect to the number of state variables. The authors of [2] 
note that a majority of continuous-space systems exhibit a coordinate structure, 
where the governing equation for each state variable is defined independently. 
When the equations depend only on a few continuous variables, then they are 
said to be sparse. They proposed a modification to the traditional brute-force 
procedure to take advantage of such sparsity only in constructing abstractions. 
Unfortunately, the authors do not leverage sparsity to improve synthesis of sym- 
bolic controllers, which is, practically, more computationally complex. In this 
paper, we propose a parallel implementation of their technique to utilize HPC 
platforms. We also show how sparsity can be utilized, using a parallel implemen- 
tation, during the controller synthesis phase as well. 

The framework pFaces [4] is introduced as an acceleration ecosystem for 
implementations of symbolic control techniques. Parallel implementations of the 
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abstraction and synthesis algorithms are introduced as computation kernels in 
pFaces, which are were originally done serially in SCOTS [9]. The proposed algo- 
rithms treat the problem as a data-parallel task and they scale remarkably well 
as the number of PEs increases. pFaces allows controlling the complexity of 
symbolic controller synthesis by adding more PEs. The results introduced in 
[4] outperform all exiting tools for abstraction construction and controller syn- 
thesis. However, for a fixed number of PEs, the algorithms still suffer from the 
state-explosion problem. 

In this paper, we propose parallel algorithms that utilize the sparsity of the 
interconnection in the construction of abstraction and controller synthesis. In 
particular, the main contributions of this paper are twofold: 


(1) We introduce a parallel algorithm for constructing abstractions with a dis- 
tributed data container. The algorithm utilizes sparsity and can run on 
HPC platforms. We implement it in the framework of pFaces and it shows 
remarkable reduction in computation time compared to the results in [2]. 

(2) We introduce a parallel algorithm that integrates sparsity of dynamical sys- 
tems into the controller synthesis phase. Specifically, a sparsity-aware pre- 
processing step concentrates computational resources in a small relevant 
subset of the state-input space. This algorithm returns the same result as 
the monolithic procedure, while exhibiting lower runtimes. To the best of 
our knowledge, the proposed algorithm is the first to merge parallelism with 
sparsity in the context of symbolic controller synthesis. 


2 Preliminaries 


Given two sets A and B, we denote by |A| the cardinality of A, by 24 the 
set of all subsets of A, by A x B the Cartesian product of A and B, and by 
A\ B the Pontryagin difference between the sets A and B. Set R” represents 
the n-dimensional Euclidean space of real numbers. This symbol is annotated 
with subscripts to restrict it in the obvious way, e.g., R} denotes the positive 
(component-wise) n-dimensional vectors. We denote by 74 : A x B — A the 
natural projection map on A and define it, for a set C C A x B, as follows: 
ta(C) = {a € A | doer (a,b) E€ C}. Given a map R: A— B and a set AC A, 
we define R(A) := U {R(a)}. Similarly, given a set-valued map Z : A — 2? 
acA 


and a set A C A, we define Z(A) := U Z(a). 
acA 
We consider general discrete-time nonlinear dynamical systems given in the 


form of the update equation: 


X: x" = f(x,u), (1) 


where x € X C R” is a state vector and u € U C R™ is an input vector. The 
system is assumed to start from some initial state x(0) = xo € X and the map 
f is used to update the state of the system every T seconds. Let set X be a 
finite partition on X constructed by a set of hyper-rectangles of identical widths 


268 M. Khaled et al. 


X G) — re 


Fig. 1. The sparsity graph of the vehicle example as introduced in [2]. 


n € R? and let set U be a finite subset of U. A finite abstraction of (1) is a 
finite-state system X = (X,U,T), where T C X x Ux X is a transition relation 
crafted so that there exists a feedback-refinement relation (FRR) R C X x X 
from X to X. Interested readers are referred to [8] for details about FRRs and 
their usefulness on synthesizing controllers for concrete systems using their finite 
abstractions. 

For a system X, an update-dependency graph is a directed graph of verticies 
representing input variables {u1,u2,--- , Um}, state variables {21,72,--- , En}, 
and updated state variables {x},a},--- ,at}, and edges that connect input 
(resp. states) variables to the affected updated state variables based on map f. 
For example, Fig. 1 depicts the update-dependency graph of the vehicle case- 
study presented in [2] with the update equation: 


zt fi(@1, £3, U1, U2) 
£3 | = | fo(v2,23, U1, U2) | , 
£3 fa(£3, U1, U2) 


for some nonlinear functions fı, f2, and fs. The state variable x3 affects all 
updated state variables x7, xj, and a}. Hence, the graph has edges connecting 
x3 to rt, TA and ree respectively. As update-dependency graphs become denser, 
sparsity of their corresponding abstract systems is reduced. The same graph 
applies to the abstract system X. 

We sometimes refer to X, U, and T as monolithic state set, monolithic input 


set and monolithic transition relation, respectively. A generic projection map 
Pi : A> T (A) 


is used ao extract elements of the corresponding subsets affecting the updated 
state Be . Note that A C X := Xı x Xə x--- x Xn when we are interested in 
extracting subsets of the state set and A C U := U, x Ua x--» x Um when we are 
interested in extracting subsets of the input set. When oo subsets of the 
state set, 7’ is the projection map Tp x Keg Xx Kg? where k; € {1,2,--- ,n}, 
j € {1,2,---,K}, and Xp, x Xka X +++ X Xz, is a subset of states affecting 
the updated state variable De Similarly, when extracting subsets of the input 
set, 7’ is the projection map TO, XO pg XX pp? where p; E€ {1,2,---,m},7€ 
{128% , P}, Uy x Up, X = x Üp, is a subset of inputs affecting the updated 
state caddie zr 

For exaaiple, assume that the monolithic state (resp. input) set of the system 
X in Fig. 1 is given by X := Xı x Xə x X; (resp. U := U, x U2) such that for 
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any Z := (z1, 2,73) € X and ū := (ŭ1,ū2) € U, one has % € Xj, Ta € Xo, 
Z3 € Xz, ty € Uj, and tig € U2. Now, based on the dependency graph, Pi (z) := 
T xX, (7) = (zı, z3) and Più ) := T0, x0, (U) = (t1, U2). We can also apply 
the map to subsets of X and U, e.g., P(X) = X1 x X3, and P/(U) =U, x Up. 

For a transition element t = (%,t,z’) € T, we define PÍ (t) = (Pi (2), 
PF a ), Tx, (7 ))s for any component i € {1,2,---,n}. Note that for t, the suc- 
cessor aran Zz’ is treated differently as it is ie directly to the updated state 
variable af . We can apply the map to subsets of T, e.g., for the given update- 
dependency graph in Fig. 1, one has PI(T) = = X, x X; x U, x Un x Xj. 

On the other hand, a generic recovery map 


DÍ : PÍ (A) > a, 


is used to recover elements (resp. subsets) from the a ees subsets back to 
their original monolithic sets. Similarly, A C X := X, x Xə x--- x Xn when we 
are interested in subsets of the state set and A C U := U, x Up x--- x Um when 
we are interested in subsets of the input set. 

For the same example in Fig.1, let Z := (Z1, %2,%3) € X be a state. Now, 
define Z, := Pj (Z) = (z1, Z3). We then have DÝ (Zp) := {(1, 23,3) | Z3 € Xo}. 
Similarly, for a transition element t := ((€1, £2, %3), (t1, U2), (71, 72,73) € T 
and its projection he = PÍ (t) = ((z1, z3), (U1, u2), (71)), the recovered transi- 
tions is the set Df | (tp) = {((21, Z3, 3), (ū1, ū2), (G2 2S) | 73 € X2, 77 € 
Xə, and 7% € Xa}. 

Given a subset X C X, let [X] := DfoPf (X). Note that [X] is not necessarily 
equal to X. However, we have that X C [X]. Here, [X] ] over-approximates X. 

For an update map f in (1), a function Qf : X x U > X x X characterizes 
hyper-rectangles that over-approximate the reachable sets starting from a set 
Z € X when the input @ is applied. For example, if a growth bound map (£ : 
R” x U — R”) is used, Q/ can be defined as follows: 


QF (z, a) E (rib, ub) T (> r+ f(Te, u), r+ f(e, u)), 


where r = B(n/2,u), and Ze € Z denotes the centroid of z. Here, 8 is the growth 
bound introduced in [8, Section VII]. An over-approximation of the reachable 
sets can then be obtained by the map Of : X x U — 2* defined by: 


Of (3,0) := Qo Rf (z, a), 
where Q is a quantization map defined by: 
Q(xw, tw) = {2 € X | 7N [xw, cw] # 0), (2) 


where [£b Zuo] = Pinas Zub,1] x [Tib 2, Lub,2] Xr X [Livn Lub,- 

We also assume that Of can be decomposed component-wise (i.e., for each 
dimension i € {1,2,--.,n}) such that for any (%,u) € X x U, Of(z,u) = 
(DOP (E), pi (a))), where Of : P(X) x PÍ(Ū) — 2°!) is an over- 
approximation function restricted to component i € {1,2,--- ,n} of f. The same 
assumption applies to the underlying characterization function Rf. 
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Algorithm 1: Serial algorithm for constructing abstractions (SA). 


Input: X,U, Of ee 
Output: A transition relation T C X x U x X. 


1T<-9; E >œ Initialize the set of transitions 
2 for all z € X do 

3 for all ù € U do 

4 for all 7’ € Of (z,ū) d 

5 |T- TULLE) : > Add a new transition 
6 end 

7 end 

8 end 


Algorithm 2: Serial sparsity-aware algorithm for constructing abstractions 
(Sparse-SA) as introduced in [2]. 

Input: X,U,Of ee 

Output: A transition relation T C X x U x X. 


Tee KE ee > Initialize the set of transitions 
2 for alli € {1,2,--- mh do 

3 T; — SA(P!(X), P!(U), OF) ; > Transitions of sub-spaces 
4 T<-TN Bhs > Add transitions of sub-spaces 
5 end 


3 Sparsity-Aware Distributed Constructions 
of Abstractions 


Traditionally, constructing X is achieved monolithically and sequentially. This 
includes current state-of-the-art tools, e.g. SCOTS [9], PESSOA [6], CoSyMa [7], and 
SENSE [3]. More precisely, such tools have implementations that serially traverse 
each element (7,4) € X x U to compute a set of transitions {(Z,u,Z’) | z’ € 
O!(z,u)}. Algorithm 1 presents the traditional serial algorithm (denoted by SA) 
for constructing X. 

The drawback of this exhaustive search was mitigated by the technique intro- 
duced in [2] which utilizes the sparsity of X. The authors suggest constructing 
T by applying Algorithm 1 to subsets of each component. Algorithm 2 presents 
a sparsity-aware serial algorithm (denoted by Sparse-SA) for constructing X, as 
introduced in [2]. If we assume a bounded number of elements in subsets of each 
component (i.e., |Pi (X)| and |P! (Ū (U)| from line 3 in Algorithm 2), we would 
expect a near-linear complexity of the algorithm. This is not clearly the case in 
[2, Figure 3] as the authors decided to use Binary Decision Diagrams (BDD) to 
represent transition relation T. 

Clearly, representing T as a single storage entity is a drawback in Algorithm 2. 
All component-wise transition sets T; will eventually need to push their results 
into T. This hinders any attempt to parallelize it unless a lock-free data structure 
is used, which affects the performance dramatically. 
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Algorithm 3: Proposed sparsity-aware parallel algorithm for constructing 
discrete abstractions. 
Input: X,U, Qf 


n 


P 
Output: A list of characteristic sets: K := U U AP. ;- 
p=li=1 á 
1 for alli € {1,2,--- ,n} do 
2 for all p € {1,2,--- , P} do 
3 Pei I; > Initialize local containers 
4 end 
5 end 
6 for alli € {1,2,--- ,n} in parallel do 
7 for all (Z, a) € PÍ (X) x PÍ (Ū) in parallel with index j do 
8 p=I(i j); > Identify target PE 
9 (21, Zup) — QF (Z, @) ; > Calculate characteristics 
10 K? ci — Kine VUE, T, (1, £ub))} ; > Store characteristics 
11 end 
12 end 
PÍ (6) PÍ (0) PJ (8) 
=U, x Ug = Uj x U2 =U, x U2 
A A A 

2 4 6 

loc,1 loc,2 loc,3 

1 3 5 

loc,1 loc,2 loc,3 > 

PSX P(X PI (x 

(Xx) 3 (X) (X) 
= X] x X3 = Xq x X3 = X3 


Fig. 2. An example task distributions for the parallel sparsity-aware abstraction. 


On the other hand, Algorithm 2 in [4] introduces a technique for constructing 
X by using a distributed data container to maintain the transition set T without 
constructing it explicitly. In [4], using a continuous over-approximation Q/ is 
favored as opposed to the discrete over-approximation Of since it requires less 
memory in practice. The actual computation of transitions (i.e., using Of to 
compute discrete successor states) is delayed to the synthesis phase and done 
on the fly. The parallel algorithm scales remarkably with respect to the number 
of PEs, denoted by P, since the task is parallelizable with no data dependency. 
However, it still handles the problem monolithically which means, for a fixed P, 
it will not probably scale as the system dimension n grows. 

We then introduce Algorithm 3 which utilizes sparsity to construct X in par- 
allel, and is a combination of Algorithm 2 in [4] and Algorithm2. Function 
IT: N4 \ {coo} x N4 \ {œ} > {1,2,--- , P} maps a parallel job (i.e., lines 9 and 
10 inside the inner parallel for-all statement), for a component i and a tuple 
(Z,%) with index j, to a PE with an index p = I(i,j). Kie; stores the char- 
acterizations of abstraction of ith component and is located in PE of index p. 
Collectively, Kye 1) ++ +1 Khoo gy ++ K os n» constitute a distributed container that 
stores the abstraction of the system. 

Figure 2 depicts an example of the job and task distributions for the example 
presented in Fig. 1. Here, we use P = 6 with a mapping J that distributes one 
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Fig. 3. Comparison between the serial and parallel algorithms for constructing abstrac- 
tions of a traffic network model by varying the dimensions. 


partition element of one subset Pf (X) x P/(U) to one PE. We also assume that 
the used PEs have equal computation power. Consequently, we try to divide 
each subset P/(X) x PÍ(Ū) into two equal partition elements such that we 
have, in total, 6 similar computation spaces. Inside each partition element, we 
indicate which distributed storage container A? ; is used. 

To assess the distributed algorithm in comparison with the serial one presented 
in [2], we implement it in pFaces. We use the same traffic model presented in [2, 
Subsection VI-B] and the same parameters. For this example, the authors of [2] 
construct T;, for each component i € {1,2,--- ,n}. They combine them incre- 
mentally in a BDD that represents T. A monolithic construction of T from T; is 
required in [2] since symbolic controllers synthesis is done monolithically. On the 
other hand, using A?’ ; in our technique plays a major role in reducing the com- 


oc,t 
plexity of constructing higher dimensional abstractions. In Sect. 4, we utilize A? ; 
directly to synthesize symbolic controllers with no need to explicitly construct T. 

Figure 3 depicts a comparison between the results reported in [2, Figure 3] and 
the ones obtained from our implementation in pFaces. We use an Intel Core i5 
CPU, which comes equipped with an internal GPU yielding around 24 PEs being 
utilized by pFaces. The implementation stores the distributed containers K loc, i 
as raw-data inside the memories of their corresponding PEs. As expected, the 
distributed algorithm scales linearly and we are able to go beyond 100 dimensions 
in a few seconds, whereas Figure3 in [2] shows only abstractions up to a 51- 
dimensional traffic model because constructing the monolithic T begins to incur 


an exponential cost for higher dimensions. 


Remark 1. Both Algorithms 2 and 3 utilize sparsity of X to reduce the space 
complexity of abstractions from |X x Ū| to X; |Pf (X) x Pi (U)|. However, 
Algorithm 2 iterates over the space serially. Algorithm 3, on the other hand, 
handles the computation over the space in parallel using P PEs. 
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4 Sparsity-Aware Distributed Synthesis of Symbolic 
Controllers 


Given an abstract system X = (X,U,T T), we define the controllable predecessor 
map CPreT : 2**¥ — 2**U for Z CX xO by: 


CPre™(Z) = {(%,u) € X x U | Ø A T(Z,&) C 1x(Z)}, (3) 


where T(z,ū) is an interpretation of the transitions set T as a map T : 
X x Ū — 2* that evaluates a set of successor states from a state- input 
pair. Similarly, we introduce a component-wise controllable predecessor map 
CPre™ : 2Pi(®)xPi(0) _, QP) (X)xPi(O) for any component i € {1,2,--- ,n} 


and any Z := P!(Z) = T pt (xX) x PI (0) (4); as follows: 


CPre™ (Z) = {(@,a) € P/(X) x PÍ (0) | OA Tz, a) Cre (D}. (4) 


Proposition 1. The following inclusion holds for any i € {1,2,--- ,n} and any 
ZOXxU: 
Pf (CPre™ (Z)) © CPre™ (PË (Z)). 


Proof. Consider an element zp € P!(CPre™ (Z)). This implies that there exists 
z € X x U such that z € CPreT(Z) and zp = PL(z ). Consequently, T;(zp) # 0 
since T(z) 4 Ø. Also, since z € CPreT (Z), then T(z) C rx(Z). Now, recall how 
T; is constructed as a component-wise set of transitions in line 2 in Algorithm 2. 
Then, we conclude that T;(zp) C 7x, (P/(Z)). By this, we already satisfy the 
requirements in (4) such that zp = (Z,u) € CPre™ (Z). 


Here, we consider reachability and invariance specifications given by the LTL 
formulae Ow and Ow, respectively, where w is a propositional formula over a 
set of atomic propositions AP. We first construct an initial winning set Zy = 
{(Z,u) € X x U | L(@,a) H ¥)}, where L : X x U — 24? is some labeling 
function. During the rest of this section, we focus on reachability specifications 
for the sake of space and a similar discussion can be pursued for invariance 
specifications. 

Traditionally, to synthesize symbolic controllers for the reachability specifi- 
cations Oy, a monotone function: 


G(Z) := CPre™(Z) U Zy (5) 


is employed to iteratively compute Zæ = uZ.G(Z) starting with Zo = Ø. Here, 
a notation from p-calculus is used with u as the minimal fixed point operator 
and ZC X x U is the operated variable representing the set of winning pairs 
(z,u) € X x U. Set Zæ C X x U represents the set of final winning pairs, 
after a finite number of iterations. Interested readers can find more details in 
[5] and the references therein. The transition map T is used in this fixed-point 
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Algorithm 4: Traditional serial algorithm to synthesize C enforcing the 
specification Ow. 

Input: Initial winning domain Zy C X xU andT 

Output: A controller C : Xy > 2. 


1 Zo“ Í; > Initialize a running win-pairs set 
2 Xu — A > Initialize a running win-states set 
i si Zo = Zo ; > Current win-pairs gets latest win-pairs 
5 Zoo — CPreT (Zo) U Zy ; > Update the running win-pairs set 
6 D — Z% \ Zo; > Separate the new win-pairs 
7 foreach 7 € tx(D) with T ¢ Xu do 

8 Xu — Xu U {T} ; > Add new win-states 
9 C(z) := {u € U|(z,u) € D} ; > Add new control actions 
10 end 


11 while Z% 4 Zo; 


computation and, hence, the technique suffers directly from the state-explosion 
problem. Algorithm 4 depicts a traditional serial algorithm of symbolic controller 
synthesis for reachability specifications. The synthesized controller is a map C : 
X, > 2%, where X,, C X represents a winning (a.k.a. ee set of 
states. Map C is defined as: C(z) = {a € U | (z,ū) € w™Z.G(Z)}, where 
j(Z) = inf{i € N | Z € we(u'Z.G(Z))}, and p'Z.G(Z) represents the set of 
state-input pairs by the end of the ith iteration of the minimal fixed point 
computation. 

A parallel implementation that mitigates the complexity of the fixed-point 
computation is introduced in [4, Algorithm 4]. Briefly, for a set Z C X x U, each 
iteration of ~Z.G(Z) is computed via parallel traversal in the complete space 
X x U. Each PE is assigned a disjoint set of state-input pairs from X x U and 
it declares whether, or not, each pair belongs to the next winning pairs (i.e., 
G(Z)). Although the algorithm scales well w.r.t P, it still suffers from the state- 
explosion problem for a fixed P. We present a modified algorithm that utilizes 
sparsity to reduce the parallel search space at each iteration. 

First, we introduce the component-wise monotone function: 


G,(Z) = CPre™ (PÍ (Z)) U PÍ (Zy), (6) 


for any i € {1,2,--- ,n} and any Z € X x U. Now, an iteration in the sparsity- 

aware fixed-point can be summarized by the following three steps: 

(1) Compute the component-wise sets G,;(Z). Note that G,(Z) lives in the set 
PS(X) x PFO). 

(2) Recover a monolithic set G;(Z), for each i € {1,2,--- ,n}, using the map 
DÍ and intersect these sets. Formally, we denote this intersection by: 


n 


IG(2)] = (P (G:(2))). (7) 


i=1 
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Note that [G(Z)] is an over-approximation of the monolithic set G(Z), which 
we prove in Theorem 1. 

(3) Now, based on the next theorem, there is no need for a parallel search in 
X x U and the search can be done in [G(Z)]. More accurately, the search 
for new elements in the next winning set can be done in [G(Z)] \ Z. 


Theorem 1. Consider an abstract system X = (X,U,T). For any set Z € 
X x U, G(Z) eZ]: 


Proof. Consider any element z € G(Z). This implies that z € Z, z E€ Zy or 
z € CPreT (Z). We show that z € [G(Z)] for any of these cases. 


Case 1 [z € Z]: By the definition of map Pj’, we know that P/(z) € PÍ (Z). By 
the monotonicity of map G,, P/(Z) C G,(Z). This implies that P/(z) € 
G,(Z). Also, by the definition of map Di, we know that z € BiG G,(Z)). 
The above argument holds for any component i € {1,2,--- ,n} which 
implies that 2 € N; (Df (G,(Z))) = [G(2)]. 

Case 2 [z € Zy]: The same argument used for the previous case can be used for 
this one as well. 

Case 3 [z € CPre™(Z)]: We apply the map P/ to both sides of the inclu- 
sion. We then have PJ (z) € P/(CPre™(Z)). Using Proposition 1, we 
know that P{(CPre™(Z)) C CPre™(Z). This implies that P/(z) € 
CPre™ :(PF(Z)). From (6) we obtain that PS (2) € G,(Z), and 
consequently, z € DÍ (G;(Z)). The above argument holds for any 
component i € {1,2,---,n}. This, consequently, implies that z € 
Ni (DIG: (Z))) = [G(Z)], which completes the proof. 


Remark 2. An initial computation of the controllable predecessor is done 
component-wise in step (1) which utilizes the sparsity of X and can be eas- 
ily implemented in parallel. Only in step (3) a monolithic search is required. 
However, unlike the implementation in [4, Algorithm 4], the search is performed 
only for a subset of X x U, which is [G(Z)] \ Z. 


Note that dynamical systems pose some locality property (i.e., starting from 
nearby states, successor states are also nearby) and an initial winning set will 
grow incr cmentally with each fixed-point iteration. This makes the set [G(Z)]\ Z 
relatively small w.r.t |X x U|. We clarify this and the result in Theorem 1 with 
a small example. 


4.1 An Illustrative Example 


For the sake of illustrating the proposed sparsity-aware synthesis technique, we 
provide a simple two-dimensional example. Consider a robot described by the 


following difference equation: 
tT _ |e TTU 
r3 |  |z2+Tu2[” 
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Fig. 4. A visualization of one arbitrary fixed-point iteration of the sparsity-aware syn- 
thesis technique for a two-dimensional robot system. 
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Zy 


x, x 


Latest winning set Z 
Fig. 5. The evolution of the fixed-point sets for the robot example by the end of fixed- 
point iterations 5 (left side) and 228 (right side). A video of all iterations can be found 
in: http://goo.gl/aeganf. 


where (£1, £2) € X := X; x XQ is a state vector and (u1, u2) € U := U; x Ug is an 
input vector. Figure 4 shows a visualization of the sets related to this sparsity- 
aware technique for symbolic controller synthesis for one fixed-point iteration. 
Set Zy is the initial winning-set (a.k.a. target-set for reachability specifications) 
constructed from a given specification (e.g., a region in X to be reached by 
the robot) and Z is the winning-set of the current fixed-point iteration. For 
simplicity, all sets are projected on X and the readers can think of U as an 
additional dimension perpendicular to the surface of this paper. 

As depicted in Fig.4, the next winning-set G(Z) is over-approximated by 
[G(Z)], as a result of Theorem 1. Algorithm 4 in [4] searches for G(Z) in (X; x 
X2) x (U, x U2). This work suggests searching for G(Z) in [G(Z)] \ Z instead. 


4.2 A Sparsity-Aware Parallel Algorithm for Symbolic Controller 
Synthesis 


We propose Algorithm 5 to parallelize sparsity-aware controller synthesis. The 
main difference between this and Algorithm 4 in [4] are lines 9-12. They 
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Algorithm 5: Proposed parallel sparsity-aware algorithm to synthesize C 
enforcing specification Ow. 


Input: Initial winning domain Zy C X xU andT 
Output: A controller C : Xy > 2”. 


1 Z» “=Í; > Initialize a shared win-pairs set 
2 Xy- Î; >œ Initialize a shared win-states set 
3 do 

4 Zo — Loo ; >œ Current win-pairs set gets latest win-pairs 
5 for all p € {1, 2,---,P} do 

6 a -l ; > Initialize a local win-pairs set 
7 X loc 9; > Initialize a local win-states set 
8 end è — 

9 [G] X xU; > Initialize [G(Z)] 
10 for all į € {1,2,--- ,n} do 

11 [G] — [G] N DI (G,(Zoe)) ; > Over-approximate 
12 end 

13 for all (z, u) € [G] \ Zæ in parallel with index j do 

14 p= ; > Identify a PE 
15 IN — Qo K? e(z,0) ; > Compute successor states 
16 if Posts C Zo U Zy then 

17 Zhe — Zing U {(2,0)} ; > Record a winning pair 
18 XE loe — X? loc U {2} > Record a winning state 
19 if z Z mx(Z) then 

20 | C(z)— C(z)U {ù} ; > Record a control action 
21 end 

22 end 

23 end 

24 for all p € {1,2,--- , P} do 

25 Leo — Zoo U vine : > Update the shared win-pairs set 
26 Xy— Xy UX? Tea > Update the shared win-states set 
27 end 


28 while Z% # Zo; 


correspond to computing [G(Z)] at each iteration of the fixed-point compu- 
tation. Line 13 is modified to do the parallel search inside [G(Z)] \ Z instead of 
X x U in the original algorithm. The rest of the algorithm is well documented 
in [4]. 

The algorithm is implemented in pFaces as updated versions of the ker- 
nels GBFP and GBFP,, in [4]. We synthesize a reachability controller for the robot 
example presented earlier. Figure5 shows an arena with obstacles depicted as 
red boxes. It depicts the result at the fixed point iterations 5 and 228. The blue 
box indicates the target set (ie., Zy). The region colored with purple indicates 
the current winning states. The orange region indicates [G(Z)] \ Z. The black 
box is the next search region which is a rectangular over approximation of the 
[G(Z)] \ Z. We over-approximate [G(Z)]\ Z with such rectangle as it is straight- 
forward for PEs in pFaces to work with rectangular parallel jobs. The synthesis 
problem is solved in 322 fixed-point iterations. Unlike the parallel algorithm in 
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[4] which searches for the next winning region inside X x U at each iteration, 
the implementation of the proposed algorithm reduces the parallel search by an 
average of 87% when searching inside the black boxes in each iteration. 


Fig. 6. An autonomous vehicle trying to avoid a sudden obstacle on the highway. 


5 Case Study: Autonomous Vehicle 


We consider a vehicle described by the following 7-dimensional discrete-time 
single track (ST) model [1]: 


£I = T1 +724 c08(r5 + £7), 

Ly = T2 + TT4SiNn(T5 + 27), 

T3 = T3 T TU, 

Ty = T4 T TU2, 

Ts = T5 T TT6, 

rE = T6 4 TEF (l4Cs,f(glr — uzheg)£3 + (lrCs,r(glf + uzheg) — lfCs,s (glr 
—ugheg))a7 — (Lgl pC, ¢(glr — uzheg) + ÈCs,r(glf + uzheg)) 2), 

a7 = 27+ RUTA (Cs,f(glr — uaheg)x3 — (Cs,r(glf + uaheg) + Cs, (glr 
—u2heg))£7 + (Cs.r(gl + U2heg )lr = Cs, (gle = uzheg)lp) Z) — T6, 


where zı and x2 are the position coordinates, x3 is the steering angle, x4 is the 
heading velocity, x5 is the yaw angle, xe is the yaw rate, and 27 is the slip angle. 
Variables u; and ug are inputs and they control the steering angle and heading 
velocity, respectively. Input and state variables are all members of R. The model 
takes into account tire slip making it a good candidate for studies that consider 
planning of evasive maneuvers that are very close to the physical limits. We 
consider an update period 7 = 0.1s and the following parameters for a BMW 
320i car: m = 1093 [kg] as the total mass of the vehicle, u = 1.048 as the friction 
coefficient, lp = 1.156 [m] as the distance from the front axle to center of gravity 
(CoG), J, = 1.422 [m] as the distance from the rear axle to CoG, heg = 0.574 
[m] as the height of CoG, I, = 1791.0 [kg m?] as the moment of inertia for 
entire mass around z axis, Cs, = 20.89 [1/rad] as the front cornering stiffness 
coefficient, and Cs, = 19.89 [1/rad] as the rear cornering stiffness coefficient. 

To construct an abstract system X, we consider a bounded version of the state 
set X := [0,84] x [0, 6] x [—0.18, 0.8] x [12, 21] x [—0.5, 0.5] x [—0.8, 0.8] x [—0.1, 0.1], 
a state quantization vector nx = (1.0,1.0,0.01, 3.0, 0.05,0.1,0.02), a input set 
U := [—-0.4,0.4] x [—4,4], and an input quantization vector ny = (0.1, 0.5). 
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Table 1. Used HW configurations for testing the proposed technique. 


Identifier | Description PEs Frequency 
HW, Local machine: Intel Xeon E5-1620 8 (3.6 GHz 
HW2 AWS instance p3.16xlarge: Intel(R) Xeon(R) E5-2686 64 2.3 GHz 
HW3 AWS instance c5.18xlarge: Intel Xeon Platinum 8000 72 3.6 GHz 


Table 2. Results obtained after running the experiments EX; and EX2. 


EX; (Memory = 22.1 G.B.) EX2 (Memory = 49.2 G.B.) 
|X x Ū| = 23.8 x 109 |X x Ū| = 52.9 x 109 
HW | Time Time Speedup|HW |Time Time Speedup 
pFaces/GBFP, | This work pFaces/GBFP, | This work 
HWo2 /2.1h 0.5h 4.2x HW, | >24h 8.7h >2.7x 
HW3/1.9h 0.4h 4.7x HW2/8.1h 3.2h 2.5x 


We are interested in an autonomous operation of the vehicle on a high- 
way. Consider a situation on two-lane highway when an accident happens sud- 
denly on the same lane on which our vehicle is traveling. The vehicle’s controller 
should find a safe maneuver to avoid the crash with the next-appearing obsta- 
cle. Figure6 depicts such a situation. We over-approximate the obstacle with 
the hyper-box [28, 50] x [0,3] x [—0.18,0.8] x [12,21] x [—0.5,0.5] x [—0.8, 0.8] x 
[—0.1, 0.1]. 

We run the implementation on different HW configurations. We use a local 
machine and instances from Amazon Web Services (AWS) cloud computing 
services. Table 1 summarizes those configurations. We also run two different 
experiments. For the first one (denoted by EX,), the goal is to only avoid 
the crash with the obstacle. We use a smaller version of the original state set 
X := [0,50] x [0,6] x [-0.18, 0.8] x [11, 19] x [—0.5, 0.5] x [—0.8, 0.8] x [—0.1, 0.1]. 
The second one (denoted by EX2) targets the full-sized highway window (84m), 
and the goal is to avoid colliding with the obstacle and get back to the right 
lane. Table 2 reports the obtained results. The reported times are for construct- 
ing finite abstractions of the vehicle and synthesizing symbolic controllers. Note 
that our results outperform easily the initial kernels in pFaces which itself out- 
performs serial implementations with speedups up to 30000x as reported in [4]. 
The speedup in EX, is higher as the obstacle consumes a relatively bigger vol- 
ume in the state space. This makes [G(Z)| \ Z smaller and, hence, faster for our 
implementation. 


6 Conclusion and Future Work 


A unified approach that utilizes sparsity of the interconnection structure in 
dynamical systems is introduced for the construction of finite abstractions 
and synthesis of their symbolic controllers. In addition, parallel algorithms are 
designed to target HPC platforms and they are implemented within the frame- 
work of pFaces. The results show remarkable reductions in computation times. 
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We showed the effectiveness of the results on a 7-dimensional model of a BMW 
320i car by designing a controller to keep the car in the travel lane unless it is 
blocked. 

The technique still suffers from the memory inefficiency as inherited from 
pFaces. More specifically, the data used during the computation of abstraction 
and the synthesis of symbolic controllers is not encoded. Using raw data requires 
larger amounts of memory. Future work will focus on designing distributed data- 
structures that achieve a balance between memory size and access time. 
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Abstract. Finding good variable orders for decision diagrams is essen- 
tial for their effective use. We consider Multiway Decision Diagrams 
(MDDs) encoding a set of fixed-size vectors satisfying a set of linear 
invariants. Two critical applications of this problem are encoding the 
state space of a discrete-event discrete state system (DEDS) and encod- 
ing all solutions to a set of integer constraints. After studying the rela- 
tions between the MDD structure and the constraints imposed by the lin- 
ear invariants, we define igank, a new variable order metric that exploits 
the knowledge embedded in these invariants. We evaluate iRank against 
other previously proposed metrics on a benchmark of 40 different DEDS 
and show that it is a better predictor of the MDD size and it is better 
at driving heuristics for the generation of good variable orders. 


Keywords: Decision diagrams - 
Variable order metrics and computation 


1 Introduction 


Decision diagrams (DDs) are a popular data structure to encode large sets 
of structured data, for example vectors whose elements take values over finite 
domains, but it is well-known [10] that the size of the DD strongly depends on 
how the structure of the data (its “variables”) is mapped to the structure of the 
DD (its “levels”). The problem of determining the association of variable(s) to 
levels is the “variable ordering problem” and it is known that finding an optimal 
order is an NP-complete problem [9] for any DD class, including binary DDs 
(BDDs [10]) and multiway DDs (MDDs [19]). This has given rise to a variety 
of metrics (to compare the effectiveness of two orders without actually building 
the corresponding DDs) and of heuristics (to compute sub-optimal orders, often 
by attempting to optimize a given metric). DDs play a central role in many sys- 
tem verification tools [4,11,14,20,22], where they typically support state space 
exploration. Tools often make use of general-purpose DD libraries [8, 18, 26, 27]. 
Libraries typically support dynamic reordering to improve the current order at 
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run-time, while the definition of an initial order (static ordering) is typically 
up to the verification tool, which can rely on domain knowledge. The two prob- 
lems are synergistic: reordering works better if the initial order is at least fairly 
good. 

Our research seeks to find good variable order metrics and good variable 
order heuristics for MDDs encoding sets of fixed-size vectors, when these vectors 
satisfy some linear invariants. We want to answer whether it is possible to lever- 
age invariant information to define effective metrics and heuristics for variable 
order. Two applications where this is important are encoding the state space of a 
discrete-event discrete state system and encoding all solutions to a set of integer 
constraints. In this paper, we concentrate on the first problem, but also address a 
special case of the second. Specifically, we study the relationship between MDDs 
and linear invariants with integer coefficients, and define two new metrics, PF 
and iRank, and associated heuristic and meta-heuristic. PF and iRank exploit the 
constraint imposed by the invariants. Our evaluation shows that iRank is superior 
to any other metric we consider, in all experiments we performed. 

We do not discuss the state-of-the art on heuristics, see [7] for a full survey, 
but only metrics and on how metric optimization can guide a meta-heuristic. 
After the necessary background in Sect.2, Sects. 3 and 4 define the metrics PF 
and igank, based on a number of observation and propositions on the relation 
between MDD and invariants. Section 5 experimentally evaluates the two metrics 
against several other metrics on 40 different models, considering thousands of 
variable orders. Section 6 summarizes our results and discusses future work. 


2 Background 


Let B = {L,T}, N and Z denote the set of booleans, natural numbers, and 
integers, respectively. All other sets are denoted by calligraphic letters, e.g., A. 


2.1 Discrete-Event Discrete-State System and Their State Space 
A discrete-event discrete-state system can be generally described by providing: 


(1) The set of potential states Spot, defining the type of system states. We 
assume Spot = N”; i.e., a state m is a valuation of a finite set V of natural 
variables. 

(2) The state Mini € Spot, describing the initial state of the system. 

(3) The relation R C Spot X Spot, describing the state-to-state transitions; if 
(m, m’) E€ R, the system can move from m to m’ in one step. We assume 
that R is defined by a finite set of events € and a function Effect : EX Spot > 
(Spot U {0}), specifying the unique state m’ reached if event e occurs in 
state m, none if Effect(e,m) = o (e is disabled in m). We write mm’ iff 
Effect(e,m) = m’ # o. 


= e en 
The reachable states are Spe, = {m : e1,...,en E€ E, Mini pes —m} and, 


for such a system, an invariant is a boolean function f : Spot — B with the 
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property that it evaluates to T in all reachable states: m € Sren => f(m), while 
it may be either T or L in the unreachable states Spot \ Srch- 

We specify DEDSs as Petri nets, because of their widespread use and the 
large body of literature on Petri net invariants. In Petri net terminology, the 
evaluation of variables describes the number of tokens in the set P of places 
(thus the state, or marking, is a vector in NP), the events € correspond to the 
transitions T, while two N?*7 matrices C7 and Ct define the system evolution. 
Effect(t,m) = m+C+[P,t]—C-[P, t] (transition firing) iff m > C7 [P, t], other- 
wise Effect(t,m) = 0, i.e., t is disabled in m, where > is interpreted component- 
wise. The incidence matrix C = C+ — C™ is the net change to the marking 
caused by firing transition t is C[P,t]. Figure 1 shows two Petri nets used as 
running example. Places are shown as circles, transitions as bars, and C~ (C+) 
as incoming (outgoing) arcs for transitions with the corresponding value in C7 
(C+) shown on the arc (omitted if 1). The incidence matrix and initial marking 
are next to the nets. 

A p-flow is a vector m € ZP \ {0} such that m? -C = 0, and its support 
is Supp(7w) = {v € P : wv] 4 0}. A p-flow m implies a linear invariant of the 
form Ym € Syren : nT -M = TT - Ming, where T? - Minu = Te(r) is obviously a 
constant value, the token count of the invariant, which depends only on Minit- 
If clear from the context, m may refer to either a p-flow or the implied invariant. 

P-flows with no negative entries are called p-semiflows. Let F be the set 
of p-flows, Ft the set of p-semiflows, and F~ = F \ Ft the p-flows that are 
not p-semiflows. Since multiplying a p-flow by a non-zero integer results in a p- 
flow, these sets are either empty or infinite. Figure 1 shows the minimal p-flows 
(defined later) as column vectors, with the token count below the vector. 

A p-semiflow m describes a conservative invariant, which implies a bound 
m{v] < Te(m)/r|v] on the number of tokens in each place v of the support of 
am for any reachable marking m. Column “bnd” in Fig. 1 reports these bounds. 
The two p-semiflows in Fig. 1(A) express the following invariants: 


fi : Vm € Srch m| Po] + m[|P,b] + m|P2)] + m| P35] — 2? 
fe : Vm € Srch; m| Po] + m|{P,a] + m|P2a] + m|P3a] = 2. 


These in turn imply that the number of tokens in each place is bounded by 2. 
We assume that each place v is bounded by some n, € N, and redefine Spot as 
X vep(0,1,...,v]. This ensures that Sren is finite and therefore can be encoded 
in a (large enough but finite) MDD. This is the case if the Petri net is covered 
by conservative invariants, i.e., each place is in the support of some p-semiflow. 

Work on Petri net invariants has mainly targeted F* rather than F, possibly 
because it is easier to compute properties, like the bounds of places, with FT. 
On the other hand, F can be characterized by a basis (whose size equals to the 
dimension of the null space of C, thus cannot exceed the smaller of |P| and 
|T|), while F+ can only be characterized by a minimal generator, the smallest 
set of vectors that can generate its elements through non-negative integer linear 
combinations of its elements. It has been shown [15] that this set is finite, is 
unique (thus we can denote it as F*,), and consists of all minimal p-semiflows 


min 
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m: 
0 
0 
0 
0 
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0 
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3333 3 3 3 310 0 0 


Fig. 1. Two Petri nets, their incidence matrices, and their p-flows. 


(where a p-semiflow is minimal if the g.c.d. of its coefficients is 1 and its Suppor 
does not strictly contain the support of another p-semiflow). However, Fo. may 
have size exponential in |P|. A classic example of this is a Petri net sequence of 
fork and join models with n + 1 transitions and 2n + 1 places whose F*,,, has 
size 2”. Figure 1(B) shows the case n = 3. The reader can find in [15] full details 
and a thorough analysis of the cost of computing Fin- 

In addition to Srn, we can define Ssat = {m E NP : Vr € F,m- a = 
Tc(7)}. Obviously Sn C Ssaz. We let S refer to either when the distinction is 
not relevant. Note that Ssat is a superset of the linearized reachability set [21] 
{m €N? : dy EN? , m= Mma +C-y"}, used in Petri net theory to devise a 


semi-decidable procedure for safety properties. 


2.2 Multiway Decision Diagrams 


Definition 1 (MDD). Given a global domain X = ot. where each local 
domain Xp is of the form {0,1,...,n,} for some nz € N, an (ordered, quasi- 
reduced) MDD over ¥ is a directed acyclic graph with exactly two terminal 
nodes, T and L, at level 0 (we write T.lul = L.lul = 0), with each non-terminal 
node p at some level p.lul = k € {1,..., L} having one outgoing edge for each 
i © Xz, pointing to a node pļ|i] at level k—1 or to L, and with no duplicates (there 
cannot be nodes p and q at level k with pli] = qļi] for all i € Xp) or redundant 
nodes (node p at level k is redundant if p[0] = p[i] for all i E€ Xk) pointing to 
L. The function fp : ¥ — B encoded by an MDD node p is recursively defined 
as fplir,.-- iL) = fo, (i1:--- iL) if plul = k > 0, and fp(i1,...,12) = P 
if p.lul = 0. Interpreting f, as an indicator function, p also encodes the set 
Sp C X, defined as Sp = {(i1,...,in) : fp(t1,...,ix)}. This is the set of variable 
assignments compatible with the paths from p to T. 
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+ + 
(A) bnd fmin MDD (B) bnd Fmin MDD 
P2a 2 ouR Plb 2 One 
Pla 2 onelan Pla 2 DUANE 
P3a 2 onoo P2b 2 oaan aana 
PO 2 1] 2] (0) P2a 2 oft Hof Hor fo [6}f0[ 112} 0] [0] (0 
Plb 2 ONANAN P3b 2 CHARA ON [077] [2] [0] [1] [0] or 
P3b 2 OTOL 2} 0 P3a 2 HOHORO 
P2b 2 TBI PO 2 TAA 
Fig. 2. P-semiflows and MDD for two variable orders for the net in Fig. 1(A). 


MDDs are a canonical representation of subsets of ¥: given MDD nodes p 
and q at the same level, Sp = Sq iff p = q. We observe that quasi-reduced MDDs 
differ from the more common fully-reduced MDDs, which allow edges to skip 
levels by eliminating all redundant nodes, not just those encoding L. As it will 
be clear, though, the quasi-reduced MDD encoding the state space of a Petri net 
covered by invariants cannot contain redundant nodes, thus coincides with the 
fully-reduced MDD for such models. When drawing MDDs, edges point down 
and we omit node L, edges pointing to it, and the corresponding cells in the 
originating node, so that, if node p at level k with Xk = {0,...,4} is drawn as 
2|3|, it means that p[0] = p[1] = p[4] = L. We also omit node T and edges 
pointing to it, but not the corresponding cell in the originating node. 

MDDs have been successfully employed to generate and store the reachable 
state space of DEDSs, in particular Petri nets, using fixpoint symbolic iterations. 
The MDD representation of a state space S,,;, is computed as the least fixpoint 
of the equation Z = Z U {Mimi} U {m : m € ZA Je € €,mSm’}, while the 
generation of Ssat simply needs to consider one flow (and associated invariant) 
at a time, thus can be achieved by performing exactly |F| — 1 intersections of 
the sets of assignments satisfying each individual constraint. 

Since we focus on the size of the MDD encoding S, we only consider MDDs 
with a single root node r, so that S, = S. Letting Mp be the set of MDD nodes 
al level k, we characterize the MDD size in terms of its nonterminal nodes N, 
i.e., IN| = JZ; |M] (although, unlike for BDDs [10] where nodes have exactly 
two outgoing edges, the number of MDD edges SMA H(p, i): p E€ Np, pli] £ L} 
could also be a meaningful measure of size). The first step to generate S is 
to map the places P of the Petri net to the L levels of the MDD. We limit 
ourselves to mapping each place to a different level, i.e., requiring a variable 
order A : P — {1,..., L}, where L = |P|. It is known that the choice of A 
can exponentially affect the size of MDD and finding an optimal mapping is 
NP-complete [9]. We stress that we consider only the final size of the MDD. In 
reality, the fixpoint iterations to compute Sren or the intersections to compute 
Ssat can lead to an intermediate size of the MDD (peak size) that is normally 
much larger than the final size. However, our work to reduce the final MDD size is 
largely orthogonal to other strategies (like saturation [13] for S,., construction) 
aimed at reducing the peak size, thus both can be employed to improve efficiency. 
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The MDDs in Fig. 2 encode Sren for the Petri net of Fig. 1(A), for two different 
variable orders. More precisely Fig. 2 shows, left to right, and for each order, 
the variable order (with level L at the top), the place bounds, the p-semiflows 
Ft... (with the token count at the bottom), and the corresponding MDDs. The 
variable order in (B) is poor, resulting in an MDD with 40 nodes, while that in 
(A) requires only 19 nodes. 


2.3 Metrics for Variable Orders 


A metric M is a perfect predictor of MDD size if M(\,) < M(Az2) implies 
IN(A1)| < |M(,2)| for any variable orders 1 and Az, where N(A) is the number 
of nodes in the MDD for S when using variable order A; no efficiently-computable 
perfect predictor is known. Metrics have been defined based on the span of events 
in the incidence matrix C, on the bandwidth of C, on the center of gravity of 
events, and on p-semiflows. Metrics that consider the span of each event t (dis- 
tance between the top and bottom nonzero in C7 [P,t] or C*[P, t] for the given 
variable order) are the Normalized sum of Event Span (NES), the Weighted NES 
(WES), Sum of Span (SOS) [24], Sum of Tops (SOT) [11] and Sum of Unique 
and Productive Spans (SOUPS) [25]. Classic bandwidth reduction techniques 
from linear algebra were applied to variable order computation for the first time 
in [23]. The corresponding metrics are Bandwidth (BW), Profile (PROF), or 
Wavefront (WF), computed on a squared matrix derived from the incidence 
matrix C. Point-transition spans (PTS) is the metric used as a convergence cri- 
terion by the widely used heuristic FORCE [3], an algorithm for multi-dimensional 
clustering of graphs that has been adapted to variable order generation. A center 
of gravity for the variables is defined and the orders are measured in terms of 
hyperdistance of the variable from the center of gravity. PTS? [6] is a variation 
of PTS to consider also the effect of p-semiflows in the PTS variable cluster- 
ing. Finally, the p-semiflow span (PSF) is the metric optimized by the heuristic 
defined in [5], which works by ordering the variables according to p-semiflows. 
PSF is a measure of the proximity of places that belong to the same p-semiflow. 

An overview of these metrics can be found in [6], which also studies their 
coefficient of correlation to determine the predictive power of each metric over a 
large set of models and of orders. All models in the study are Petri nets, mostly 
conservative. We now provide some details for SOUPS and PSF which, together 
with PTS’, have been reported as valuable predictors [6]. 

SOUPS modifies the sum of transition spans (SOS) metric [24] by considering 
only once the maximal common portion of multiple transition spans having 
the same effect on the marking and avoids counting the bottom portion of a 
transition span if it checks but does not change the marking of the corresponding 
places. SOUPS performs particularly well in conjunction with saturation [13], as 
it tends to result in even smaller peak MDDs. SOS and SOUPS, just like WES 
and NES [24] or SOT [11], are easily computed from the matrices C~ and Cr. 

PSF is computed analogously to SOS, but considering p-semiflow spans 
instead of transition spans: 
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PSF(A) = Dreft, (max{A(v) : w(v] A O}—min{A(v) : z[v] #4 0}+1). 


In our figures, the column for p-flow m has a dark cell with [v] in it for each 
place v in Supp(7), a light empty cell for each place not in the support but 
bracketed by places in the support, and a white empty cell for the remaining 
places not in the support. With this notation, PSF is just the count of the 
number of non-white squares in the matrix of Frun 

There has been a proposal [16] to use of p-semiflows to eliminate some state 
variables (decision diagram levels) through a greedy heuristic, but later work 
[12] observed that this leads to a loss of locality in the MDD representation of 
the transition relation, and suggested instead to use p-semiflows to merge vari- 
ables, proving that this always reduces the MDD size. The same paper [12] also 
proposed to modify the sum-of-transition-tops (SOT) metric so that it considers 
also a set of linearly independent p-semiflows, but provided no hints about the 
relative weight given to transitions vs. p-semiflows when computing the metric. 


3 MDD and Invariants: The PF Metric 


We now begin investigating the relationship between p-flows and the shape of 
the MDD encoding Srch and Ssat, and introduce the new metric PF. 


P-flows and information remembered at level k. The invariant correspond- 
ing to a p-flow m imposes a constraint on the reachable markings, since it implies 
a constant weighted sum of the tokens in the places belonging to Supp(7). Thus, 
the MDD must “remember” (using distinct nodes at level k) the possible partial 
weighted sums corresponding to places in the invariant support that are above 
level k, as long as the invariant is active, i.e., its support contains places mapped 
to levels k or below, and this is true even if the place mapped to level k is not in 
the support. Thus, intuitively, places in the support should be mapped to levels 
close to each other. This can be easily seen seen in Fig. 2(B), where the places in 
the support of the two p-semiflows in bo are not in consecutive levels, result- 
ing in more nodes: the level for P2b has 9 nodes, since the MDD must remember 
the partial sum of tokens of the places in the two branches of the Petri net of 
Fig. 1(A), and each of them can range from 0 to 2. In the order of Fig. 2(A), all 
places in the top branch are instead above the level of place Po, which is in turn 
above all places in the bottom branch. Thus, level Pp has only three possible 
values to remember: whether in the top (and thus in the bottom) branch there 
are 0, 1, or 2 tokens (and therefore P) has 2, 1, or 0 tokens, respectively). This 
dependence is captured by the metric PSF of Sect. 2.3. The PSF value for order 
(B) is 13, while it is 8 for order (A), consistent with the intuition that a smaller 
value of PSF results in a smaller MDD. 


P-flows and singletons. The token count of an invariant m determines a single 
possible value for the number of tokens in the level “completing” a (the lowest 
level corresponding to a place in Supp(7)), which can then only contain single- 
tons (nodes with a single outgoing edge). This is the case for level Po in the 
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MDD of Fig.2(A) and P) in the MDD of Fig. 2(B). Interestingly, level Pza in 
the MDD of Fig. 2(B) also contains only singletons. This is due to an invariant 
generated by a p-flow in F~: 


T3: m|P,a] + m| Pa] + m|P3a] = m| Pb] m m| P2b] = m| P30] = 0, 


As p-flows in F~ have similar implications on the MDD structure as those in 
Ft, we define a new metric PF, by extending PSF to consider also non-positive 
p-flows. Give a set of p-flows Fmin, we can then define: 


PFOA) = E, € zni, (max{ AD) : w(p) # 0}—min{ A(p) : (p) # 0}-+1), 


But what is an appropriate choice for Finjn? To have a consistent definition 
of the metric we need Fmin to be uniquely and appropriately defined. While 
p-semiflows are characterized by a unique generator set Ffin, p-flows can be 
characterized by a basis, but the choice of basis is not unique and can lead to 
meaningless value of PF (for example if we choose a basis where each p-flow has 
the same span over the places, so that any variable order results in the same 
value for the PF metric). 

Continuing the analogy with PSF, we define Fmin as the set of minimal p- 
flows, i.e., the g.c.d. of their entries is 1 and their support does not strictly include 
the support of any other p-flow; in addition, to avoid considering both a p-flow 
and its negative, we assume an arbitrary place order (unrelated to the MDD 
variable order) and require the first nonzero entry to be positive. We now prove 
that this set Fmin is unique and that it can generate a multiple of any p-flow. In 
the figures, the set Fmin is shown partitioned into Ft. and Fan” Fmin\ Fa, 


Theorem 1. Set Fmin is unique, and it spans all p-flow directions, i.e., given 
m € F, for some a € Z, am equals a linear combination of elements in Fmin- 


Proof. To prove uniqueness, it suffices to show that there can be at most one 
minimal p-flow with a given support. Assume by contradiction that there are 
two distinct minimal p-flows mı and mo with Supp(mı) = Supp(m2) = Q, and 
let ay > 0 and az > 0 be the coefficients in mı and 72 corresponding to the first 
place v € Q, respectively. Then, define m = ag71 — aıT2, so that [|v] = 0. 

If m £0, then r € F but Supp(m) C Q\ {v}, thus mı and m2 cannot be minimal 
p-flows since their support strictly contains the support of m, a contradiction. 
If m = 0, then agmı = amz, which implies ag = aj, since the g.c.d. of both mı 
and mə is 1. But then, mı = T2, again a contradiction. 

To prove that Fmin spans all p-flow directions, consider m} € F. There must exist 
nı E Fmin with Supp(a1) C Supp(ri); pick v € Supp(m1) and let a, = mfo] 
and bı = mi |v], so that aya, = byw, + mh, with Supp(w5) C Supp(a}) \ {v} 
Either Supp(7)) = 0, or it is a p-flow, in which case we can repeat the process 
to obtain aor% = bem + 7, and so on. Eventually, we must reach the case 
Supp(w,41) = 0, i.e., mh} = 0, at which point we can write a1: ann) = 
biaz- An, + b9a3 +++ Gn +--+ +bnnn, Where T1,..., nn E Fmin, i.e., we can 
express a multiple of m as a linear combination of elements of Fmin- 
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We observe that the size of Fmin, like that of F,*,,,, is at most exponential 
in |P], since the proof of Theorem 1 shows that the elements of Fmin must have 
uncomparable supports. 


4 MDD and Invariants: The irank Metric 


As we shall see in Sect.5, both PSF and PF exhibit significant correlation with 
the MDD size. However, there are cases where they do not perform well, espe- 
cially when Fmin is large. Consider for example the Petri net of Fig. 1(B), and 
the three MDDs corresponding to different variable orders in Fig. 3. This Petri 
net has many minimal p-flows, |F7,,| = 8 and |Finin| = 11. The three p- 
flows in F in relate the places inside each fork-and-join subnet (P;a = P,b, for 
i = 1,2,3), while the eight p-semiflows in F,",,, relate the tokens in the three 
fork-and-join subnets with those in place Po. The order in Fig. 3(B) produces the 
smallest MDD size (37 nodes against the 49 nodes of order (A) and 69 of (C)), 
but it is the one with the worst (highest) value of PSF. On the other side also 
PF fails to chose the order with the smallest MDD: the smallest value for PF 
is 55 for order (A), which is only the second best for MDD size. One reason is 
that, when Fmin contains many related, dependent constraints affecting a given 
MDD level, counting all of them may confuse the metric. On the other hand, 
we have seen that considering instead a basis depends strongly on the choice of 
vectors included in the basis, with a meaningless metric in the worst case. 

We then propose igank, a new variable order metric which, like PSF and PF, 
is based on linear invariants but, unlike PSF and PF, is unaffected by redun- 
dant minimal p-flows and is independent of the choice of the specific p-flows 
being considered, as long as they constitute a generator set. iRank focuses on the 
number p(k) of linearly independent partial p-flows that are still active at level 
k. The definition of iRank requires a deeper understanding of the relationships 
among the MDD structure and the p-flows, as illustrated next. 


bnd Frin F min 


333333 8 310 0 6 

Nodes MDD 
Plb 1 3 Plb 1 oma 
P2a 4 TIBI Pla 4 HNÉ z 
P2b 10 5) (0) 2) 2 lft P2b 4 PRALA ol br Plb 10 ort) (oT) 0) I 
PO 10 TAM APM AGN P2a 10 SCOPING Oat PO 20 ANAE g 
P3a 10 DARAAN P3b 4 on Ps} OL} orn] fo Pa 20 DUEMOONOOONHOOMABWEM 
P3b 10 HIDE) E A) O11 Pa 10 MPO RMA RH P2a 10 JAUNA 
Pla 4 aA PO 4 AAMA Pla 4 BUA 

49 37 69 

(A) PSF: 44 PF:55 (B) PSF: 52 PF:58 |(C) PSF: 42 PF: 57 


Fig. 3. Three variable orders for the Petri net of Fig. 1(B), and the resulting MDDs. 
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Given an MDD with root node r and two MDD nodes p,q # L with p.lul = k 
and q.lul = h, let Ap (for above) be the set of paths from r to p, Bp (for below) 
the set of paths from p to T, and Cp, the set of paths from p to q: 


Ap = {(iz,---,te41) : riz] k+] = p} 
Bp = {(te,---,%1) : plik] [ir] = T} 
Coq = {(ik, - -s in41) : plik] [tnt] = 4}, 


thus A, = Bt ={()}, At = Br = Sr, Ap =Cr, p, Bp =Cp,T, Cag = Ô if q.lul > p.lul. 
When using an MDD to store S with a given variable order A, the sets of paths 
defined by Ap, Bp, and Cp,q also denote sets of submarkings, by interpreting tik 
as the number of tokens in place v = A7! (k), and so on. 


Theorem 2 [28]. The nodes at level k can be used to define a partition of S;: 
Upen, Ap X Bp = Sr, and Vp, q ENk, p # q => Ap x Bp N Aq x By = Ô. 


We can relate MDD nodes and the p-flows by proving that all submarkings 
described by Cp,q, therefore by Ap, have the same partial sum for any given p- 
flow. Given nodes p and q with p.lul = k > q.lul = h, o = (ik, ...,in41) E€ Cog, 
and a p-flow m € F, we let the partial sum of submarking o for invariant m be: 


Sum(p, q, 9, T) = Se ree tj ` mA" (j). 


In particular, for any o € Cr, T = Srcn, we have Sum(r, T, 0,7) = Te(r). 

We can now introduce two fundamental properties enjoyed by an MDD 
encoding a state space subject to a set of p-flows F, which will pave the way to 
the definition of our new metric called iRank- 


Theorem 3. Assume a set of states S subject to the set of p-flows F is encoded 
by an MDD rooted at r. Then, all paths between a given pair of nodes have the 
same partial sum for any given invariant: Vo, o” € Cp q, YT E F, Sum(p, q, 0, T) = 
Sum(p, q,o’,7). We can therefore write Sum(p, q, T). 


Proof. Consider two nodes p and q, with p.lul = k > q.lul = h, two paths o 
and o’ from p to q, and any og E Ap and op E Bq, so that both (oa, 0,0») 
and (oa, o', 0b) describe markings in S. Then, for any p-flow m € F, we have 
that Sum(r,T,(¢a,0,00),7) = Sum(r, T, (0a, o,o), m) = Tc(m). However, 
Sum(r, T, (0a, 0, 0p), T) = Sum(r, p, Ca, 7) +Sum(p, q, 0,77) +Sum(q, T, 0p, T) = 
Sum(r, T, (0a, 0’, 04), m) = Sum(r, p, Oa, 7) + Sum(p, gq, 0’, 7) + Sum(q, T, ov, T), 
thus we must have Sum(p, q, 0,7) = Sum(p, q,o',7). 


An even stronger property holds if the MDD encodes Ssat: then, every node 
in the MDD is completely identified by a unique pattern of partial p-flow sums. 


Theorem 4. Let S,,; be encoded by an MDD rooted at r. Then, the nodes at 
level k have different partial sums: 


Yp, E Ny, p#p => Ane F, Sum(r,p, m) 4 Sum(r, p’, T). 
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Proof. Remember that Sa, = {m € NP : Yn € F,m-am = Te(r)}. Assume 
that distinct nodes p,p’ E€ Mp satisfy Vr € F : Sum(r,p,7) = Sum(r,p', T). 
Since the MDD is canonical, p and p’ must encode different sets, thus there 
must be a ø in Bp \ Bp or in By \ Bp (w.lo.g. assume the former case). Then, 
considering any da € Ap and o} E Ap, we have (04,0) € Ssat and (0/,,0)  Ssat. 
But (04,0) € Ssat implies Vr E€ F,Sum(r,T,(oa,0),m7) = Te(m) and, since 
Sum(r, T, (0a, 0), m) = Sum(r,p, oa, T) + Sum(p,T, 0,7) = Sum(r,p', oh, Tm) + 
Sum(p, T,o, T) = Sum(r, T, (01,0), 7), and this holds for any m in F, we should 
also have (0/,,0) € Ssat, a contradiction. 


rPNWDNY WN FE OO 


iRank =14 
Fig. 4. Computations of the rank weights from a matrix F with rank(F) = 4. 


Theorem 4 implies that every node in the MDD encoding Ssat is completely 
identified by a unique pattern of partial p-flow sums. However, not every p-flow 
is relevant at a given level k of the MDD, and, more importantly, the portions 
from level L to level k of different p-flows may encode the same information, 
i.e., may be linearly dependent, yet these redundant portions contribute to the 
computation of the PF metric. irank, then, attempts to estimate the number of 
possible combinations of partial path sums that may actually be found in the 
nodes at level k of the MDD, taking into account these linear dependencies. 

To this end, we consider the |P| x |Fmin| matrix F (rows ordered according to 
A, columns in any order) describing the p-flows in Fmin, and define the number 
Pup(k) of linearly independent partial p-flows up to level k: 


Pup(k) = rank(F[L: k+1,-]), 


where F[L : k +1,-] is the submatrix of F with rows L through k + 1 (level k 
is excluded because we are counting the partial sums reaching level k). pup(k) 
counts both p-flows active at level k and those that are not, as the lowest place 
in their support is mapped to a level above k (p-flow already “closed” at level 
k). The number pdown(k) of linearly independent closed p-flows at level k is 
obtained by subtracting the rank of submatrix F[k : 1,-] from the rank of the 
entire matrix F: 


Pdown(k) = rank(F) — rank (F[k al ‘)). 
Then, the value we are seeking is the difference of these two quantities: 


p(k) = Pup(k) = Pdown(k). 
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Figure 4 depicts the definition of pup(k) and paown(k). The rectangles in the 
invariant matrix F represents the portions used to compute the ranks for level k. 
The values of pup(k), Pdown(k), and p(k), for all levels k, are listed on the right. 

The value of the igank metric is then the sum of all the p(k) values: 


iRank = X i<k<L p(k), 


which can be thought of as an estimate of the number of independent factors 
affecting the number of MDD nodes at the various levels. Thus, we should expect 
that a linear increase in iRank implies an exponential increase in the MDD size. 
The main advantage of iRank is that it does not suffer in the presence of an 
excessive number of p-flows (as do PF and PSF). Indeed, since the metric is 
computed on the rank of F and on the rank of sets of rows of F, and since these 
ranks do not change while adding linear combinations of p-flows (larger F) or by 
removing p-flows (smaller F) as long as we remove only linear dependent vectors, 
we have a metric that is rather robust. Additionally, it is also fairly inexpensive 
to compute, O(min{P, T}’). 


5 Experimental Assessment of the Metrics 


We now experimentally assess the efficacy of PF and iRanx: since the relationship 
between p-flows and MDD nodes is stronger for Ssat than for Sren (Theorem 4), 
we expect higher correlation when the MDD encodes the former. We also seek to 
determine whether these metrics can be used to drive iterative heuristics or meta- 
heuristics that compute variable orders. All experiments are on different sets of 
orders for 40 models taken from the Petri Net Repository [2]. The experiments 
have been conducted using the GreatSPN tool [1,4], which uses the Meddly 
library [8]. All MDDs generated had fewer than one million nodes. We follow the 
evaluation procedure of [6] and compute the Spearman coefficient of correlation 
(CC), whose interpretation is: [1,0.8] means very strong correlation, [0.6, 0.8] 
strong correlation, [0.4, 0.6] moderate correlation, and so on decreasing. Negative 
values indicate anti-correlation. 

Figure 5 compares the correlation of iRank and PF to that of the metrics of 
Sect. 2.3. Although all experiments have been performed, for sake of space only 
6 metrics are considered in the tables. We have chosen to include PSF, PF and 
inank (for obvious reasons), plus the best among the C span metrics (SOUPS), 
and two versions of PTS (PTS and PTS”, without and with p-flow) since PTS is 
the metrics implicitly optimized by the widely used FORCE heuristic. No band- 
width metric is reported since they all exhibit at best a moderate correlation. 
Each row represents a metric, columns report the CC of the metrics with the 
MDD encoding Ssat (columns [A] and [B]) and Srca (columns [C] and [D]}) for two 
different sets of orders. The CC of a single model for a single metric is computed 
from the bivariate series relating, for each variable order A, the MDD size built 
using A with the value of the metric for that A. ICC is the CC computed over 
the set of orders À in Vimpr and BCC is computed over Vggsr. The sets VIMPR 
and Vggsr are built from 1,000 initial random orders by generating sequences of 
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Fig. 5. Two correlation coefficients for different metrics for Ssat and Srch 


increasingly better orders (in terms of MDD final size) until a convergence crite- 
rion is satisfied; Vimpr retains all orders while Vggsr retains only the last orders 
in each sequence (thus exactly 1,000 orders). This construction is explained in 
[6], where it was observed that Vgrgr tends to contain mostly good orders, and 
Vimpr a mixture of good and bad orders. The above sets have been built for 
each of the 40 models. For each combination, we report the mean CC (over all 
models) and the CC distribution for the 40 models; the x axis is partitioned into 
20 bins, so the y axis indicates the number of models whose CC falls into each 
bin. All plots have the same scale on the y axis. and the height of the bar at 0 
is fixed at 36 for all rows. 

The results of Fig. 5 indicate that iRanx has the highest correlation for both 
ICC and BCC and for both Sen and Ssat. iRank is better than the second best 
by 12% (ICC on Ssat) and up to 28% (BCC on Sren). The comparison with PTS 
(the metric used as a convergence criteria by the widely used FORCE heuristic) is 
even more striking. It is also evident that in none of the four cases PF performs 
better than PSF, supporting our observation that considering more p-flows is 
not always (or even usually) a good idea. Figure 5 also indicates that all metrics 
have better CC when the MDD encodes Ssat (column [A] vs. [C], and column 
[B] vs. [D]). This is not surprising for igank, given Theorem 4, but it also holds 
for all other metrics. This could be due to the fact that, since Sen C Ssaz, the 
MDD for Sren encodes additional constraints not captured by any of the metrics. 

Comparing columns [A] and [B] (and columns [C] and [D]) of Fig.5, we 
observe that ICC is higher than BCC for all metrics, meaning that they have 
better correlation when the set of considered orders is Vpypr (mix of good and 
bad orders) rather than Vggsr (mostly good orders). This is related to the use of 
the Spearman CC, which quantifies how well the i-th largest value of the metric 
correlates with the i-th largest value of the MDD size: certainly with Vggsr we 
tend to have more MDDs of similar size, making it more difficult to discriminate. 

The experiments reported in Fig.6 serve to evaluate whether the metrics 
can be used as an objective function inside a simulated annealing procedure 
(columns [A] and [C]) or as a meta-heuristic to select one among the orders 
produced during the simulated annealing (columns [B] and [D]). Given an initial 
variable order and a metric m the procedure searches an “optimal” order through 
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Metric |[A] Ann. Distrib. (Ssa) Mean |[B] Metaheuristic (Ssa) | Mean |[C] Ann. Distrib. (Srch) Mean |[D] Metaheuristic (S,.,) | Mean 
iRank |__| 0.819 [atta | 0.885 | $e | 0.665 | _.____ a attention | 0,806 
PSF 0.748 |m. m.. m. acd, | 0.806 |e | 0.607 | m ose Bene Meta. | 0,719 
PF 0.717 |m m. mm a n a tee | 0,785 | eects | C574 |. ee, ee tet Hee lie. | 0.703 

pooo a aa 

ee 

ĖS. 


a aeemrtamettencanee st 
SOUPS | tenement, | 0.690 |e. tte cael | 0,759 0.590 |i... annam mila | 0.731 
PTS _P | nee | 0.684 [im ce centile, | 0.762 0.584 |m. m.m m iu Mis ees | 0,716 
aeeoea E aaa 
0 0.2 0.4 0.6 0.8 1 


PTS 0.641 |m. mmn Biel mutt | 0.726 0.554 |e met sells ltt | 0.684 


Fig. 6. Evaluation of metrics on simulated annealing produced orders 
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Fig. 7. Evaluation of metrics on FORCE-produced orders. 


a simulated annealing procedure [17], aimed at minimizing the value of m. We 
employ a standard simulated annealing procedure, described in [6]. Unlike the 
construction of the set of orders used for the computation of ICC and BCC in 
Fig. 5, no MDD is built during the construction of the candidate variable order. 
For each metric, the simulated annealing procedure is run 1,000 times, from 
different initial orders, and Fig. 6 reports, in columns [A] and [C], the mean and 
distribution of the “score” of the MDDs built using the 1,000 orders produced by 
the 1,000 runs of the simulated annealing for each metric m, for the 40 models. 
The score is the distance from the size of the smallest MDD built, normalized 
on the distance between the smallest and the largest MDD size built (see [6], 
Eq. 5), obviously computed separately for each model. A value of 1 for order A 
for a given model indicates that the smallest MDD seen for that model was built 
using A. A value of 0 indicates the worst order. Column [A] refers to the MDDs 
storing Ssat, while column [|C] refers to Sren. Again, inank performs better than 
any other metrics in both cases. 

Columns [B] and [D] instead report the results of using each metric m as a 
meta-heuristic: for each model a single order is chosen (the order with the best 
value for metric m), and the 40 resulting scores are plotted. This corresponds to 
using metrics in practice to select a given order for a model. Again, iRank shows 
the best performance, indicating that it can select good candidate orders. 

Figure 7 shows the evaluation of a meta-heuristic also defined in [6], based 
on FORCE. Each metric m is used to drive the selection of the “best” variable 
order among a set of variable orders produced using FORCE from an initial set 
of 1,000 random orders. This is done for each of the 40 models. The last row 
is the baseline (40x1,000 points, all computed using FORCE), while all other 
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histograms are built out of 40 MDD sizes, one per model. A mean value greater 
than the baseline mean indicates that the metric selects the best orders among 
the ones computed by FORCE. A mean below the baseline indicates otherwise. 
Again, when we employ iRank to select the order to use, we get a better score 
than with any other metric for both Ssat (left column) and S,,;, (right column). 


6 Conclusion and Future Work 


We considered the problem of defining and evaluating variable orders for MDDs 
encoding either the reachable states of a DEDS (S,,;,) or the states satisfying a 
set of linear invariants (Ssat). We studied the relation between the MDD size and 
structure and the linear invariants, and proposed two new metrics: PF, a trivial 
extension to PSF; and iRank. Through a set of experiments, metrics have been 
evaluated both as predictors of the MDD size and as drivers for two heuristics 
(and associated meta-heuristics). The experiments follow the procedure proposed 
in [6], as defining a good and fair procedure to compare metrics and MDD sizes 
for a set of models is a nontrivial task. The results show that iRanx is better than 
any other metrics we found in the literature. 

The definition of igank, and PF, assumes that linear invariants are available. 
For DEDSs specified as Petri nets, the linear invariants are derived from the p- 
flows, the left annullers of the incidence matrix, an integer matrix describing how 
an event modifies a state. Clearly, whenever a DEDS can be specified through 
a similar matrix, the application of our method is straightforward, as in the 
case of various formalisms used in system modeling and verification. For other 
formalisms, this may be less immediate, but our method only assumes a set of 
linear invariants on the state space, regardless of how they are computed. 

In our experiments, we considered only conservative Petri nets, where each 
place appears in at least one invariant. This allowed us to compare with previ- 
ously defined metrics that exploit linear invariants generated from p-semiflows. 
If no invariants are available, or if most places are not part of any invariant, PF 
and iRank could perform very poorly. If a net is not conservative, a subset of 
places may “lose” tokens, “gain tokens”, or both. The last two cases cause Sren 
to be infinite, but the first case can still be managed by our approach, thanks 
to p-flows. As an example, consider the net obtained from the net in Fig. 1(B) 
by removing the arc from transition T3 back to P Such a net does not have 
any p-semiflow, but all the places between each pair of fork-and-join belong to 
a p-flow, allowing us to apply our method. A further extension could consider 
invariants where the weighted sum of tokens in a subset of places is less than or 
equal a constant (instead of just equal). 

Several directions for additional exploration remain. First, iRank does not 
consider the initial state of the DEDS, but the number of nodes at a given 
level depends on the token count of the p-flows, and this may be especially 
important when the p-flows have significantly different token counts. Then, the 
efficient computation of iRank is obviously important, as heuristics using it could 
probably evaluate it many times. The computation could be expensive since it 
involves matrix rank computations. 
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Finally, we are interested in extending iRank to more general constraints, 
which can still provide hints on good variable orders; for example, a constraint 
“f A = 3 then B = C” imposes no limitations on C along paths where A # 
3, (assuming A is above B and B is above C in the MDD), but, requires to 
remember the value of B until reaching C along paths where A = 3. 
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Abstract. Various versions of binary decision diagrams (BDDs) have 
been proposed in the past, differing in the reduction rule needed to give 
meaning to edges skipping levels. The most widely adopted, fully-reduced 
BDDs and zero-suppressed BDDs, excel at encoding different types of 
boolean functions (if the function contains subfunctions independent of 
one or more underlying variables, or it tends to have value zero when 
one of its arguments is nonzero, respectively). Recently, new classes of 
BDDs have been proposed that, at the cost of some additional complex- 
ity and larger memory requirements per node, exploit both cases. We 
introduce a new type of BDD that we believe is conceptually simpler, 
has small memory requirements in terms of node size, tends to result in 
fewer nodes, and can easily be further extended with additional reduc- 
tion rules. We present a formal definition, prove canonicity, and provide 
experimental results to support our efficiency claims. 


1 Introduction 


Decision diagrams (DDs) have been widely adopted for a variety of applications. 
This is due to their often compact, graph-based representations of functions over 
boolean variables, along with operations to manipulate those boolean functions 
based on the sizes of the graph representations, rather than the size of the domain 
of the function. Most DD types are canonical for boolean functions: for a fixed 
ordering of the function variables, each function has a unique (modulo graph 
isomorphism) DD representation, or encoding. 

Compactness, and canonicity, is achieved through careful rules for eliminating 
nodes. All canonical DDs eliminate nodes that duplicate information: if nodes p 
and q encode the same function, one of them is discarded. Additional compact- 
ness comes from a reduction rule (or rules) that specifies both how to interpret 
“long” edges that skip over function variables, and how to eliminate nodes and 
replace them with long edges. Two popular forms of decision diagrams, Binary 
Decision Diagrams (BDDs) [1] and Zero-suppressed binary Decision Diagrams 
(ZDDs) [8], use different reduction rules. Some applications are more suitable 
for BDDs while others are more suitable for ZDDs, depending on which of the 
two reductions can be applied to a greater number of nodes. Unfortunately, it is 
not always easy to know, a priori, which reduction rule is best for a particular 
application. Worse, there are applications where both rules are useful. 
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Recently, Tagged BDDs (TBDDs) [10] and Chain-reduced BDDs (CBDDs) 
or ZDDs (CZDDs) [2] have been introduced to combine the reduction rules 
of BDDs and ZDDs. We introduce a new type of BDD, called Edge Specified 
Reduction BDDs (ESRBDDs), that we believe is conceptually simpler and has 
smaller node storage requirements than TBDDs, CBDDs, and CZDDs, while still 
exploiting the BDD and ZDD reduction rules. Additionally, ESRBDDs are flexi- 
ble in that additional reduction rules may be added with low cost. Finally, unlike 
TBDDs, CBDDs, and CZDDs, ESRBDDs treat the BDD and ZDD reduction 
rules equally: there is no need to prioritize one rule over another. 

The paper is organized as follows. Section 2 recalls definitions for BDDs and 
ZDDs and describes related work. Section3 formally defines ESRBDDs, gives 
their reduction algorithm, proves that they are a canonical form, and compares 
them with related DDs. Section4 gives detailed experimental results to show 
how the various DDs compare in practice. Section 5 provides conclusions. 


2 Related Decision Diagrams 


We focus on various types of DDs that have been proposed to efficiently encode 
boolean functions of boolean variables, and briefly recall DDs relevant to our 
work. For consistency in notation, all DD types we present encode functions of 
the form f : B4 — B and have L levels, with level L at the top. 

The first and most widely-known type is the reduced-ordered binary decision 
diagrams (BDDs) [1]. A BDD is a directed acyclic graph where the two terminal 
nodes 0 and 1 are at level 0, we write lvi(0) = lvl(1) = 0, while each nonterminal 
node p belongs to a level lul(p) € {1,...,£} and has two outgoing edges, p/[0] 
and p{1], pointing to nodes at lower levels (this is the “ordered” property). The 
“reduced” property instead forbids both duplicate nodes (p and q are duplicates if 
lul(p) = lul(q), p[0] = q[0], and p[1] = q[1]), and redundant nodes (p is redundant 
if p[0] = p[1]). The function F, encoded by BDD node p is defined as 


p lul(p) = 0, 


where (21:7) is a shorthand for the boolean tuple (z1, ..., ££). 

Another widely-used type is the zero-suppressed binary decision diagrams 
(ZDDs) [8], which differ from BDDs only in that they forbid high-zero nodes 
(node p is high-zero if p[l] = 0) instead of redundant nodes. The function 
encoded by ZDD node p is defined with respect to a level n > m = lul(p), 
as 


Pole lul 0 
Fp(z1:L) = l pl mi] (21:1) v (p) > 


0 n>mAdi,m<i<n,x;=1 
F? (£in) = Fp (£1m) n>mAVi,m <i<n, x7; =0 
p lin ail f 
F pfen] Frm-1) n=m>0 


p n=m=0. 


Both BDDs and ZDDs are canonical: any function f : B’ — B has a unique 
node p encoding it, an essential property guaranteeing time efficiency. Just as 
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important is their memory efficiency, i.e., the number of nodes required to encode 
a given function. In this respect, BDDs and ZDDs are particularly suited to 
different situations. BDDs require fewer nodes if there are many “don’t cares”, 
i.e., it often happens that F,(r1:,) = Fp(yi:1) when 21.1 and y1:z differ in one 
position, as this corresponds to redundant nodes, not stored in BDDs. ZDDs 
require fewer nodes if the function tends to have value 0 when many arguments 
have value 1 as this corresponds to high-zero nodes, not stored in ZDDs. 

Quasi-reduced BDDs (QBDDs) [5] are also canonical: they are just like BDDs 
(or ZDDs) except they only forbid duplicate nodes. QBDD edges connect nodes 
on adjacent levels. Since edges are not allowed to skip levels, nodes do not need to 
store level information, and redundant and high-zero nodes cannot be eliminated. 
A useful variation is to eliminate only redundant (or high-zero) nodes whose 
children are 0, and thus allow long edges directly to 0. In either case, QBDDs 
require at least as many nodes as BDDs and ZDDs to encode a given function, 
so they provide an upper bound on both the BDD and the ZDD sizes. 

Various decision diagrams have been proposed to combine the characteristics 
of BDDs and ZDDs and exploit the reduction potential of both. Tagged binary 
decision diagrams (TBDDs) [10] associate a level tag to each edge. BDD reduc- 
tions are implied along the edge from the level of the node to the level of the 
tag, and ZDD reductions are implied from the level of the tag to the level of the 
node pointed to by the edge. Alternatively, TBDDs can apply reductions in the 
reverse order along an edge: ZDD reductions first and BDD reductions second. 
Either reduction order can be used in TBDDs, but a TBDD can only use one of 
them, i.e., they cannot both be used in the same TBDD. 

Chain-reduced BDDs (CBDDs) and chain-reduced ZDDs (CZDDs) [2] aug- 
ment BDDs and ZDDs by using nodes to encode chains of high-zero nodes and 
redundant nodes, respectively. Each node specifies two levels, the first level indi- 
cating where the chain starts (similar to the level of an ordinary BDD or ZDD 
node), and the second, additional, level indicating where the chain ends. 

Finally, ordered Kronecker functional decision diagrams [3] allow multiple 
decomposition types (Shannon, positive Davio, and negative Davio), enabling 
both BDD and ZDD reductions. However, each level has a fixed decomposition 
type, thus this approach is less flexible, potentially less efficient, and hindered 
by the need to know which decomposition will perform best for each level. 


3 ESRBDDs 


Definition 1. An L-level (ordered) edge-specified reduction binary decision dia- 
gram (ESRBDD) is a directed acyclic graph where the two terminal nodes 0 and 
1 are at level 0, lvi(0) = Wwi(1) = 0, while each nonterminal node p belongs to 
a level lul(p) € {1,..., L} and has two outgoing edges, p[0] and p[1], pointing to 
nodes at lower levels. An edge is a pair e = (e.rule,e.node), where e.rule is a 
reduction rule in {S,Lo,Ho,X} and e.node is the node to which edge e points. 
For i € {0,1}, if lul(p[t].node) = lul(p) — 1, we say that pfi] is a short edge and 
require that pli].rule = S. If instead lul(p[i].node) < lul(p) — 1, the only other 
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possibility, we say that pfi] is a long edge, since it “skips over” one or more levels, 
and require that pli].rule € {Ho, Lo, X}. 


The reduction rule on an edge specifies its meaning when skipping levels, 
thus it is just S for short edges while, for long edges, the rules Ho, Lo, and X 
correspond to the “zero-suppressed” rule of [8], the “one-suppressed” rule (a new 
rule analogous to the zero-suppressed, as we shall see), and the “fully-reduced” 
rule of [1], respectively. To make this more precise, we recursively define the 
boolean function a hae : B” — B encoded by an ESRBDD edge (k,p) with 


respect to a level n € {0,..., L}, subject to lwi(p) < n, as 


if Wwl(p) = n = p 

if wWi(p)=n>0 (£n)? Fag (rn-1) : Fojo) (e1n—1) 
Fi, p) (1:n) = ? if Wi(p)<n,K=X, (£n)? Fip (£i:n-1) : Fen) (@1:n-1) 

if lul(p) < n,k = Ho, (£n)? 0 : Ee p) (f1m-1) 

if lul(p) < n,k = Lo, (£n)? Po (£i:n—1) : 0, 


where the if-then-else operator (£n)? fi: fo is a shorthand for (>2,A fo) V (£n A fi). 

We defined an ESRBDD as a directed acyclic graph, so it can potentially 
have multiple roots (nodes with no incoming edges). However, since our focus 
is on the size of the DD encoding a given function, we assume from now on 
that our ESRBDDs have a single root node p*, pointed to by a dangling edge 
with rule «*. We denote the set of all nodes reachable from p* (and therefore 
all nodes in the ESRBDD) as Nodes(p*). The dangling edge («*,p*) encodes the 
function F nr which is independent of «* only if lul(p*) = L, in which case 
we require «* = S, while we require K* € {Lo,Ho, X} if lul(p*) < L. Finally, we 
will informally say “ESRBDD (x*,p*)” to refer to the entire graph below (and 
including) dangling edge (K*,p*). 

Before introducing reduced ESRBDDs and showing they are canonical, we 
need some terminology. We say that an ESRBDD nonterminal node q: 


— duplicates node p if lul(p) = lul(q), p[0] = q[0], and p[1] = q[1], 

— is redundant if q[0] = q[1] = (K,p), with « € {S,X}, 

— is high-zero if q[0].rule € {S, Ho}, q[1].rule € {S,X}, and q[1].node = 0, 
— is low-zero if q[0].rule € {S, X}, q[0].node = 0, and q[1].rule € {S, Lo}. 


Note that BDDs [1] can be viewed as ESRBDDs where the edge labels are 
restricted to {S,X}, and a reduced BDD corresponds to an ESRBDD with no 
duplicate nodes and no redundant nodes. Similarly, ZDDs [8] can be viewed 
as ESRBDDs where edge labels are restricted to {S, Ho}, and a reduced ZDD 
corresponds to an ESRBDD with no duplicate nodes and no high-zero nodes. 
Also, we note that there is no corresponding definition in the existing literature 
for the version of ESRBDDs where the edge labels are restricted to {S, Lo}. 
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Fig. 1. Patterns not allowed in RESRBDDs 
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Fig. 2. Replacement rules for patterns in Fig. 1 


Definition 2. An ESRBDD is reduced if the following restrictions hold: 


R1. There are no duplicate nodes. 

R2. There are no redundant nodes. 
R3. There are no high-zero nodes. 

R4. There are no low-zero nodes. 

R5. For any edge e = (k,0), k € {S, X}. 


The last restriction disallows edges (Ho,0) and (Lo,0) in the reduced ESRBDD. 
This is because F (Ho ,0) = Fiis,0) = Fix 0) = 0, and since we want to enforce 
canonicity in the reduced ESRBDD, we have arbitrarily chosen (X,0) as the 
unique representation for such long edges. 


3.1 Reducing an ESRBDD 


An ESRBDD can be converted into a reduced ESRBDD using Algorithm 1. 
The algorithm first replaces any edges (Ho,0) or (Lo,0) with (X,0), to satisfy 
restriction R5. Then, it repeatedly chooses a high-zero, low-zero, redundant, or 
duplicate node q and eliminates it. If node q duplicates node p, then it redirects 
all incoming edges from q to p (line 7). Otherwise, q is a high-zero, low-zero, or 
redundant node, and lines 9-14 find a node d’ with lul(d’) < lul(q) =n — 1, and 
a rule K’ € {X,Ho,Lo} such that Figa) (tin) = Fiw a (Ern). Note that a short 
edge to node q becomes a long edge to node d’ because lul(d’) < lui(q). For the 
special case of d’ = 0, any edge to q is equivalent to edge (X,0), so the algorithm 
replaces those edges (line 16). 

When d' # 0, we have Fig 4) (in) = Fiw qr) (Lim) for n = lol(q) + 1, and 
these edges are replaced in line 18. It follows that Fie a (fin) = Fie ay (£in) 
for n > lul(q)+1; these replacements are made in line 19. For rules «x € {X, Ho, Lo} 
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Algorithm 1. Reduce an ESRBDD 


1: procedure REDUCE(ESRBDD («*,p*)) 
2: V — Nodes(p*); 


3 Vk € {Ho, Lo}, replace all («,0) edges with (X,0); 
4 while V contains a high-zero, low-zero, redundant, or duplicate node do 
5: Choose a high-zero, low-zero, redundant, or duplicate node q € V; 
6: if q duplicates p then 
T: Vk E€ {S, X, Ho, Lo}, replace all (K,q) edges with (K,p); 
8 else 
9: if q is a redundant node then 
10: k'— X; d — q[l].node; 
11: else if q is a high-zero node then 
12: k’ Ho; d’ — q[0].node; 
13: else if q is a low-zero node then 
14: k’ Lo; d’ — q[1].-node; 
15: if d' = 0 then 
16: VK € {S, X, Ho, Lo}, replace all («,q) edges with (X,0); 
17: else 
18: Replace all (S,q) edges with (x’,d’); 
19: Replace all («’,q) edges with (x’,d’); 
20: for all rules « € {X, Ho, Lo} \ {x’}, such that an edge («,g) exists do 
21: Create node q’ at level lvl(q) + 1 and add q’ to V; 
22: if k = X then 
23: g0] — (K',d'); q‘ [1] — (K’,d’); 
24: else if k = Ho then 
25: q'[0] — (w',d’);  q’[1] — (X,0); 
26: else if k = Lo then 
27: q'[0] — (X,0);  q’[1] — (x’,d’); 
28: Replace all («,q) edges with («,q’) or (S,q’); 
29: Remove q from V; 


with « Æ «’, we cannot replace («,q) with a single long edge to node d’, because 
the edge needs different reduction rules: the « rule is needed above level lul(q), 
and the «’ rule is needed from level lvl(q) down. So lines 21-27 of the algorithm 
create a new node q’ at level lul(q) + 1, of the appropriate shape such that 


Fika) (Zin) = Fig q) (Ern) for n = lul(q') +1. It then follows that Fika) (£in) = 
Fi grn) for n > lul(q') + 1. These replacements are made in line 28, where 


the replacement (,q’) is used for long edges, and (S,q’) is used for short edges. 
In the above discussion, any edge that is replaced by the algorithm encodes 
the same function as its replacement, giving us the following lemma. 


Lemma 1. In Algorithm1, each edge replacement preserves the function 
encoded by the ESRBDD (x*,p*). 


It remains to show that the algorithm always terminates. 


Lemma 2. Algorithm 1 terminates in O(|Nodes(p*)|) steps. 
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Proof: The proof is based on the observation that, at every iteration of the 
algorithm, a node q is chosen to be processed (line 5), at most two nodes are 
created at level lul(q) + 1 (line 21), and node q is removed (line 29). These new 
nodes (q’ on line 21), by construction, satisfy one of the following patterns: 
— q'[0] = q’[1] = (w’,d’), where d’ # 0, and k’ € {Ho, Lo}, 
— q'[0] = (X,0), and q’[1] = («’,d’), where d’ 4 0, and x’ € {X, Ho}, 

0] = (a’,d’), and q'[1] = (X,0), where d’ 4 0, and K’ € {X, Lo}. 


These nodes are neither redundant, high-zero, nor low-zero, but they could be 
duplicates. Since the elimination of duplicate nodes (line 7) does not create 
new nodes, the two nodes created at lul(q) + 1 result in at most two additional 
iterations of the algorithm. Therefore, for every node in the original ESRBDD, 
the algorithm iterates at most three times. 


Theorem 1. Algorithm 1 converts ESRBDD («*,p*) to an equivalent reduced 
ESRBDD in O(|Nodes(p*)|) steps. 


Proof: Lemma2 establishes that Algorithm 1 terminates in O(|Nodes(p*)|) 
steps. Based on the condition of the while loop, when the loop terminates, we 
know that the ESRBDD contains no high-zero, low-zero, redundant, or duplicate 
nodes. From line 3 and the fact that the algorithm never adds an edge of the 
form (Ho,0) or (Lo,0), we conclude that when Algorithm 1 terminates, any edge 
to terminal node 0 must have edge rule S or X. Therefore, when the Algorithm 
terminates, the ESRBDD is reduced. Lemma 1 establishes that Algorithm 1 pro- 
duces an equivalent (in terms of encoded function) ESRBDD. 


While we have established that Algorithm 1 always terminates and produces 
a reduced ESRBDD, we have not yet established that the Algorithm produces 
the same reduced ESRBDD, regardless of the order in which nodes are chosen in 
line 5. This is guaranteed by the canonicity property, discussed next. Addition- 
ally, we note here that, unlike most other decision diagrams (including BDDs, 
ZDDs, CBDDs, CZDDs, and TDDs), a reduced ESRBDD is not necessarily a 
minimum size ESRBDD encoding of a function, even for a fixed variable order, 
as elimination of some node q during the reduction could trigger the creation of 
two new nodes. An example of this is shown in Fig. 3, where redundant node q 
is eliminated. Edges (S,qg) and (X,q) can be simply redirected as (X,p), but the 
(Ho,g) and (Lo,g) edges require the creation of two new nodes qu, and qr,- 

While the “chaotic” non-deterministic reduction procedure in Algorithm 1 is 
handy in proving termination under the most general conditions, in practice we 
utilize a deterministic depth-first version of this algorithm that reduces a node 
only after having reduced its children. 


3.2 Canonicity of Reduced ESRBDDs 


We are now ready to discuss the canonicity of reduced ESRBDDs, i.e., to show 
that a function has a unique encoding as a reduced ESRBDD. In the following, 
we say that functions F”. œ and Fi are equivalent, written F” pT F 


(s.p) K',p') (ip) 
if FP py (tin) = Fh pry (@1n) for all possible inputs (£1:n) € B”. 
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Fig. 3. A worst-case example where elimination of node q creates two nodes. 


Theorem 2. In a reduced ESRBDD, for any n € N, for any two edges e = (k,p), 
e’ = (k’,p’) with lul(p) < n, lul(p') < n, if F? = F” then (1) p = p’, and (2) if 
lui(p) < n then K = k’. 


Proof: The proof is by induction on n. For the base case, we use n = 0 and 
from the definition of F we have F} = F9! —> p=p. 


Now, suppose the theorem holds for n = m, where m > 0, we will prove it holds 
for n = m + 1. Regardless of (,p), we have 


Kika (Zin) = (tn)? fi(@1:n—1): fo(L1:n—1) 


for some functions fp and fı. Similarly, we have 
Fie pr) (Ln) = (ta)? Fy eimai ito fina) 


It follows that ET = Fs if and only if fo = få and fi = fi. 

First, suppose lul(p) = n and lul(p’) = n. From the definition of F, it follows 
that Fro: = Fro and Fig = Fon By inductive hypothesis, p[0].node = 
p'[0].node and p[1].node = p’[1].node. If lwi(p[0].node) < n—1, then by inductive 
hypothesis, p[0] = p’[0]; otherwise, lul(p[0].node) = n — 1 and we must have 
pl0].rule = S and p'[0].rule = S, thus p[0] = p’[0]. By a similar argument, it 
follows that p[1] = p’[1]. We therefore have either that p = p’ and the theorem 
holds, or that p duplicates p’, which is impossible because of restriction R1. 
Next, suppose lul(p) < n and lul(p') < n. If k = x’, then in all cases for F we 
conclude that Fr) =F ane n and by inductive hypothesis we have that p = p’, 
so the theorem holds. We now show that « Æ «’ is impossible, by contradiction. 
Consider the possible cases for K Æ x’: 


1l. k = X: If k’ = Lo or k’ = Ho, from the definition of F we conclude that 
Bek = Fip and that F’~1 = 0. 


(KP) ( (KP) 
2. k = Lo: If k’ = Ho, from the definition of F we conclude that Fip = 0 and 
Frp) =0. 


3. The remaining cases are symmetric. 


In all cases, we conclude that F la =e O and Fi; RT p) = = 0. By the inductive 
hypothesis, we have that p = 0 and p’ = 0. According to R5, if p = 0 then 
k cannot be Lo or Ho. But this implies k = X and x’ = X, contradicting our 
assumption that s Æ K’. 
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Finally, suppose lvl(p) = n and lui(p') < n (the case lul(p) < n and lul(p’) =n 
is symmetric). We show that this is impossible, by contradiction. Consider the 
possible cases for 4’: 


1. K’ = X: From the definition of F, we must have Fa = FET pr and Fag = 
F in? pi By the inductive hypothesis, we conclude that p[0].node = p’ and 
p|1].node = p’. If lul(p’) = n—1, then we have p[0] = p[1] = (S,p’); otherwise, 
we have lul(p') < n — 1 and by inductive hypothesis, p[0] = p[1] = («’,p’) = 
(X,p’). Either way, node p is redundant, and from R2 we have a contradiction. 

rd 


2. k! = Ho: From the definition of F, we must have Fig = Ferai and 


Fig 0. By the inductive hypothesis, we conclude that p[0].node = p' 


Il 


and p[1].node = 0. If lul(p') = n — 1, then we have p[0] = (S,p’); otherwise, 
we have lul(p’) < n —1 and by inductive hypothesis, p[0] = («K’,p'} = (Ho,p’). 
Either way, node p is high-zero, and from R3 we have a contradiction. 


3. k’ = Lo: From the definition of F, we must have Fig = 0 and Fin = 


F nee By the inductive hypothesis, we conclude that p[0].node = 0 and 
pl1].node = p’. If lul(p') = n — 1, then we have p[1] = (S,p’); otherwise, we 
have lul(p') < n — 1 and by inductive hypothesis, p[1] = (K’,p’) = (Lo,p’). 
Either way, node p is low-zero, and from R4 we have a contradiction. 


The canonicity result establishes that, regardless of how a ESRBDD is con- 
structed for a given function, the resulting reduced ESRBDD is guaranteed to 
be unique (assuming a given variable order). Thus, we can determine in constant 
time whether two functions encoded as reduced ESRBDDs are equivalent (as is 
already the case for reduced ordered BDDs and ZDDs). From now on, unless 
otherwise specified, we assume that all ESRBDDs are reduced. 


3.3 Comparing ESRBDDs to Other Types of Decision Diagrams 


For the remainder of the paper, we consider the relative size of the different 
types of DD based on the interpretation of long edges, namely, BDDs, ZDDs, 
CBDDs, CZDDs, TBDDs, and ESRBDDs. We also consider ESRBDDs without 
the Lo edge label, denoted ESRBDD—Lo. These are summarized in Table 1, some 
entries (comparisons between BDDs, ZDDs, CBDDs, and CZDDs) are known 
from prior work [2,6], some entries are discussed below, and some entries are 
unknown. Entry [T1, T2] describes the worst-case increase in the number of nodes, 
as a multiplicative factor, More formally, it is the bound for “number of nodes 
required to encode f using To” divided by “number of nodes required to encode 
f using Tı” for all functions f over L boolean variables. Note that the node 
counts always include both terminal nodes. A factor of 1 indicates that type Tı 
cannot require fewer nodes than type Tə. 

First, we discuss how an arbitrary BDD can be converted into a TBDD or 
ESRBDD, and fill in the BDD row in Table 1. To build a TBDD from a BDD, 
every edge to a non-terminal node p in the BDD is annotated with the level 
tag lul(p). By definition, any such annotated edge in a TBDD implies BDD 
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Table 1. Worst-case relative increase when converting one DD type into another. 

BDD ZDD | CBDD | CZDD | TBDD |ESR—-Lo| ESR 

BDD > — |26| if) | 2g 1 1 1 

ZDD > L/2 6] | — 32] | 12] 1 1 1 

CBDD > 2 2 — 2 [2] ? 2 2 

CZDD > ? ? 3 [2] — ? 2 2 

TBDD — ? L ia ? — 3 3 

ESRBDD-Ly >| L/2 L/2 3 2 1 = 3/2 

ESRBDD — 2L/3 | 22/3 | L/2 L/2 L/2 L/2 — 


reductions for the skipped levels. A TBDD thus constructed is no larger than 
the BDD, and may be further reduced (since it could contain high-zero nodes) 
by applying the TBDD reduction described in [10]. Similarly, we can annotate 
long edges in the BDD with X (Fig. 4(a)), and short edges with S, to obtain an 
unreduced ESRBDD. We then apply Algorithm 1. We now show that this will 
not increase the ESRBDD size, and thus the resulting ESRBDD cannot be larger 
than the original BDD. 


Lemma 3. Suppose we have an unreduced ESRBDD where, for every node q, 
there exists a rule & € {X, Ho, Lo} such that every edge to q is either (S,q) or 
(«,q). Then reducing the ESRBDD will not increase the number of nodes. 


Proof: Apply Algorithm1 and in line 5, always choose a node at the lowest 
level. Then, when a node q is chosen, all incoming edges to q will be labeled 
either with S or with «. The (S,q) edges will not cause any node to be created. 
The («,g) edges will cause at most one node to be created. But then node q is 
removed. Thus, the overall number of nodes cannot increase. 


It is also easy to convert a ZDD into a TBDD or ESRBDD. To obtain a 
TBDD, annotate every edge from non-terminal node p with the level tag lul(p), 
so that ZDD reductions are used for all the edges; then reduce the TBDD. To 
obtain an ESRBDD, annotate long edges in the ZDD with Ho, see Fig. 4(b), and 
short edges with S, and apply Algorithm 1. 

The conversion from a chained DD to an unreduced ESRBDD is illustrated in 
Fig. 4(c) and (d). For each chain node x, : x; with x, > x;, create a “top node” 
with variable x,, and a “bottom node” with variable x;, that is only pointed 
to by its corresponding top node. In a CBDD, the top node will be a high-zero 
node, and all top nodes and non-chained nodes will have incoming edges labeled 
with X or S. In a CZDD, the top node will be a redundant node, and all top 
nodes and non-chained nodes will have incoming edges labeled with Hp or S. At 
worst, the unreduced ESRBDD has twice the nodes of the original CBDD or 
CZDD and, from Lemma 3, reducing this ESRBDD does not increase its size. 

In a TBDD, each edge can be characterized as short, purely X, purely Ho, or 
partly X and partly Ho. To convert into an ESRBDD, the short edges are labeled 
with S, the purely X edges are labeled with X, the purely Ho edges are labeled 
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(a) BDD (b) ZDD (c) CBDD (d) CZDD 


Fig. 4. Converting to ESRBDDs. 


with Ho. Edges that are partly X and partly Ho require the addition of a node at 
the level where the reduction rule changes, as shown in Fig. 4(e). The worst case 
occurs when every edge requires such a node. Then, since every TBDD node 
has two outgoing edges, the resulting unreduced ESRBDD will have triple the 
number of nodes. Since all of the introduced nodes have incoming X edges, and all 
other nodes have incoming S or Ho edges, from Lemma 3 this ESRBDD will not 
increase in size when it is reduced. We note here that, if there are some purely 
X edges in the TBDD, then Lemma 3 no longer applies; however, the number 
of nodes that would be added during reduction is no more than the number of 
nodes saved by not having to introduce a node on the purely X edges. 

We now consider converting from ESRBDDs into the other DD types. In the 
case where Lo edges are not allowed (row ESRBDD—Lo in Table 1), the worst 
case BDD is from ESRBDD (Hp,1) and the worst case ZDD is from ESRBDD 
(X,1). In both cases, the ESRBDD has 2 nodes, while the resulting BDD/ZDD 
has L + 2 nodes, giving ratios of L/2 + o(L), similar to the discussion in [6, 
p. 250]. The example ZDD in [2], which produces a CBDD with three times 
as many nodes, can be converted into an ESRBDD of the same size. Similarly, 
the example BDD in [2], which produces a CZDD with twice as many nodes, 
can be converted into an ESRBDD of the same size. Any ESRBDD without Lo 
edges can be converted into a TBDD by labeling X edges with a level tag such 
that the X rule is always applied, and labelling Ho edges with a level tag such 
that the Ho rule is always applied. Therefore, the TBDD cannot be larger than 
the ESRBDD. An ESRBDD-—Ly can be converted into an ESRBDD by running 
Algorithm 1 to eliminate any low-zero nodes. For each low-zero node that is 
eliminated, we could have an incoming X and Ho edge, causing the creation of 
two nodes. Suppose we eliminate n low-zero nodes that cause creation of two 
nodes. Then, because each low-zero node must have 2 incoming edges, we must 
have 2n incoming edges to these nodes. Above, we must have at least 2n — 1 
nodes to produce these edges. We could then “stack” such a pattern m times. 
This gives an ESRBDD with m(n + 2n — 1) +2 = m(3n — 1) + 2 nodes, and a 
reduced ESRBDD with m(2n + 2n — 1) +2 = m(4n — 1) + 2 nodes. The upper 
bound of this ratio is 3/2, which occurs when n = 1 and m goes to infinity. 

For the case of ESRBDDs with all types of edges (row ESRBDD in Table 1), 
the Lo edge allows us to build different worst cases. Consider an ESRBDD (S,p) 
where lvl(p) = L, p[0] = (Ho,1), and p[1] = (Lo,1). This ESRBDD has 3 nodes. 
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Table 2. Numbers of nodes for dictionary benchmarks. 


Word List QBDD BDD CBDD ZDD CZDD  TBDD ESR 
ies Compact 1,120,437 1,120,250 971,387 657,969 657,969 657,902 484,765 
Full 1,285,501 1,285,285 1,153,438 851,555 851,554 851,479 520,576 
Compact 9,739,638 9,739,638 656,649 311,227 311,227 311,227 311,227 
Full 22,775,492 22,775,492 656,712 311,227 311,227 311,227 311,227 


One-hot 


Password List QBDD BDD CBDD ZDD CZDD TBDD ESR 
Binary Compact 5,705,516 5,704,777 4,542,925 2,960,478 2,960,465 2,960,209 2,399,272 
Full 5,649,626 5,648,670 4,960,446 3,532,847 3,532,816 3,532,467 2,410,589 
Compact 72,858,088 72,858,088 3,055,784 1,486,430 1,486,430 1,486,430 1,486,430 
Full 101,737,047 101,737,047 3,056,067 1,486,430 1,486,430 1,486,430 1,486,430 


One-hot 


Because BDDs cannot exploit Họ or Lọ edges, this will produce a BDD with 
2(L—1)+3 =2L+1 nodes, giving a worst-case ratio of 2L/3. The ZDD worst- 
case is similar, using instead p[0] = (X,1). Finally, for DD types that can exploit 
both X and Ho edges, the ESRBDD (Lo,1) corresponds to the worst case: the 
CBDD, CZDD, TBDD, and ESRBDD-—Lp will all require L + 2 nodes. 


4 Experimental Results 


We compare the performance of QBDDs (with long edges to 0), BDDs, ZDDs, 
CBDDs, CZDDs, TBDDs, and ESRBDDs on three sets of benchmarks. The 
first two benchmarks are similar to those used in [2], and are representative of 
general textual information and digital logic functions, respectively. The third 
benchmark is typical in state space analysis of concurrent systems. 


4.1 Dictionaries 


A dictionary can be encoded as an indicator function over the set of strings 
of a given length from either the compact alphabet consisting of the distinct 
symbols found in the dictionary plus NULL, or the full alphabet of all 128 ASCII 
characters (to ensure that all encoded strings have the same length, shorter 
ones are padded with the ASCII symbol NULL). We use the encoding schemes 
described in [2]: one-hot and binary. Therefore, each dictionary generates four 
benchmarks, one for each choice of encoding and alphabet. 

We compare the different DD types on two dictionaries. The first one is the 
English words in file /usr/share/dict/words under MacOS, containing 235,886 
words with lengths ranging from 1 to 24. Its compact alphabet contains lower 
and upper case letters plus hyphen and NULL (54 in total). The second one is a set 
of passwords from SecLists [7] (non-ASCII characters are replaced with NULL), 
containing 999,999 passwords with lengths ranging from 1 to 39. Its compact 
alphabet consists of 91 symbols including NULL. 
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Table 3. Numbers of nodes for combinational circuit benchmarks. 


Circuit QBDD BDD CBDD ZDD CZDD TBDD ESR 
C432 2,675 1,506 1,506 2,658 2,189 1,494 1,498 
C499 29,483 28,769 28,769 29,316 28,749 28,610 28,428 
C880 15,048 6,496 6,496 15,044 9,640 6,496 6,491 
C1355 85,694 75,498 75,498 85,439 77,976 75,243 74,757 
C1908 18,456 16,210 16,174 17,859 16,047 15,687 15,685 
C2670 74,940 15,662 15,658 74,468 21,012 15,539 15,601 
C3540 152,523 51,878 51,778 150,539 64,563 50,871 51,146 
C5315 26,011 3,793 3,784 25,785 4,749 3,716 3,742 


Table 2 reports the number of nodes required to store each dictionary, accord- 
ing to different encodings and alphabets (the best result on each row is in bold- 
face). Except for QBDDs and BDDs, the one-hot encoding results in fewer nodes, 
demonstrating the effectiveness of the zero-suppressed idea when encoding large, 
sparse data. Among the DD types we consider, ESRBDDs have the fewest nodes, 
regardless of encoding and alphabet. For binary encodings, ESRBDDs use 19%- 
39% fewer nodes than TBDDs, the second best choice. With one-hot encodings, 
ZDDs, CZDDs, TBDDs, and ESRBDDs tie for best because (a) there are no 
redundant nodes and (b) any low-zero nodes that are eliminated do not cause 
an overall decrease in the number nodes in the ESRBDDs. Indeed, redundant 
nodes are rare even with binary encodings, as they arise when two words w1 
and wz not only have bit patterns that differ in a position, but they also share 
all their possible continuations, i.e., ww is a word if and only if wow’ is also a 
word, for all w’. In the English word list, “Hlidhskjalf” and its alternate spelling 
“Hlithskjalf” is one such rare instance (note that no w’ can continue either of 
them to form an additional word). 


4.2 Combinational Circuits 


BDDs are commonly used to synthesize and verify digital circuits. We select a 
set of combinational circuits from the LGSynth’91 benchmarks [11] and, for each 
circuit, we build a DD encoding all its output logic functions. For each circuit, 
the variable order is determined using Sifting [9] while building the BDD. 

Table3 reports the number of nodes needed to encode all outputs of each 
circuit. In contrast to the dictionaries, these benchmarks place importance on the 
ability to eliminate redundant nodes. Thus, QBDDs and ZDDs have the worst 
performance. TBDDs and ESRBDDs are always the two best representations, 
and the difference between them is less than 0.7%. 


4.3 Safe Petri Nets 


Decision diagrams are frequently used in symbolic model checking to represent 
sets of states. We have selected a set of 37 safe Petri nets from the 2018 Model 
Checking Contest https://mcc.lip6.fr/2018/. A Petri net is safe if each one of 
its places can contain at most one token—each place can, therefore, be mapped 
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Table 4. Final scores for the safe Petri net benchmarks. 


QBDD BDD CBDD ZDD CZDD TBDD ESR 
3.108 2.971 2.038 1.215 1.167 1.160 1.001 


Table 5. Number of nodes for a subset of the safe Petri net benchmarks. 


Model QBDD BDD CBDD ZDD CZDD TBDD ESR 

DiscoveryGPU-PT-14a 80,865 75,682 75,571 43,016 43,016 39,689 40,953 
BusinessProcesses-PT-04 282,787 282,787 130,825 67,228 67,228 67,228 66,983 
Referendum-PT-0020 343,676 343,676 339,552 194,607 194,607 194,607 184,789 
NeoElection-PT-3 414,962 414,962 34,860 15,519 15,519 15,519 15,507 
SimpleLoadBal-PT-10 503,777 503,777 376,896 191,460 191,460 191,460 182,403 
LamportFastMutEx-PT-4 507,897 507,897 252,361 122,487 122,487 122,487 119,111 
AutoFlight-PT-06a 520,755 520,755 356,729 207,409 207,409 207,409 178,855 
RwMutex-PT-r0020w0010 553,073 553,073 502,831 358,580 358,580 358,580 195,377 
DES-PT-01b 709,303 709,303 442,610 246,647 246,647 246,647 217,325 
Dekker-PT-015 1,191,942 1,191,942 844,466 504,726 504,726 504,726 403,801 
Railroad-PT-010 2,109,610 2,109,610 1,096,122 554,541 554,541 554,541 516,121 
NQueens-PT-08 3,698,534 3,698,534 2,295,689 1,443,628 1,443,628 1,443,628 1,069,242 


ResAllocation-PT-R020C002 5,532,167 5,532,167 4,554,792 2,826,856 2,826,856 2,826,856 2,167,111 


directly to a boolean variable. Most of these models have scaling parameters 
that affect their size and complexity, yielding N = 103 model instances. 

Providing detailed results for all the model instances would require excessive 
space, so to summarize over all model instances, Table 4 shows a score for each 
DD type i. The score is the geometric mean [4]: 


score(i) = 


where N is the total number of model instances, T;(n) is the number of nodes 
needed to represent the state space of instance n using DD type i, and Tmin(n) 
is the smallest number of nodes needed to represent the state space of instance n 
by any of the DD types we consider. ESRBDDs have by far the smallest overall 
score, barely larger than 1, indicating that they are either the smallest or slightly 
larger than the smallest for each model instance. 

Table5 shows T;(n) for model instances n that required more than 250,000 
nodes in the QBDD representation. For parameterized models that had multiple 
model instances satisfying this criterion, we present data for only the largest 
such model instance. We have also included the results for DiscoveryGPU—the 
only model where ESRBDDs were not the best (they were a close second). 


4.4 Memory Considerations: The Size of Nodes 


So far, we have compared DD types based on how many nodes they require. How- 
ever, the actual memory consumption also depends on the size of the respective 
nodes. All of these DDs store two child pointers. In addition, BDDs and ZDDs 
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Table 6. Overhead of node sizes (bits per node) as compared to QBDD nodes. 


Level bits BDD ZDD CBDD CZDD TBDD _ ESR-Lo ESR 


16 +16 +16 +32 +32 +48 +18 20 
20 +20 +20 +40 +40 +60 +22 24 
32 +32 +82 +64 +64 +96 +34 +36 


store a level, CBDDs and CZDDs store two levels, TBDDs store three levels, 
while ESRBDDs store a level and two edge rules. Since all short edges must be 
labeled by S, it is only necessary to label the long edges, and this requires log, n 
bits per edge if there are n non-S reduction rules. Without Lo edges, a single 
bit distinguishes Ho from X; otherwise, two bits are required for rules {Ho, Lo, X}. 
QBDD nodes are therefore the smallest (typically requiring 64 or 128 bits, when 
32-bit or 64-bit pointers are used, respectively) and Table 6 indicates the addi- 
tional cost required for each node type, when the level integers are stored using 
16 bits (as suggested by [2]), 20 bits (as suggested by [10]), and 32 bits. 
ESRBDDs are clearly more memory efficient than CBDDs, CZDDs and 
TBDDs. There are a few instances in our experiments where TBDDs use 
marginally fewer nodes than ESRBDDs (less than 3.2% fewer nodes in every 
such instance), but not enough to overcome their per-node memory overhead. 


5 Conclusions 


We have shown that ESRBDDs are a simple, yet efficient, generalization of 
previous attempts at combining reduction rules. Unlike previous efforts, they are 
not biased towards any particular reduction rule and therefore eliminate the need 
for the user to prioritize the reduction rules. They also provide a framework for 
further generalizations through additional reduction rules—for example, “high- 
one” and “low-one”, the duals of “low-zero” and “high-zero” respectively. 

ESRBDDs allow users to select a subset of reduction rules that suit their 
needs, and make it possible to integrate domain-specific reduction rules (a com- 
mon phenomenon) with a subset of existing ones. ESRBDD nodes are also more 
compact than all previous such efforts, and new reduction rules can be added 
at a small cost—log, n bits per edge, where n is the number of reduction rules. 
Our future efforts will be directed towards adapting BDD manipulation opera- 
tions (such as Apply) to work with the reduction rules in ESRBDDs, and towards 
including complement edges and other reduction rules, such as “high-one”, “low- 
one”, or “identity” reductions, while maintaining canonicity. 
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Abstract. Symbolic-Heap Separation logic is a popular formalism for 
automated reasoning about heap-manipulating programs, which allows 
the user to give customized data structure definitions. 

In this paper, we give a new decidability proof for the separation logic 
fragment of losif, Rogalewicz and Simacek. We circumvent the reduction 
to MSO from their proof and provide a direct model-theoretic construc- 
tion with elementary complexity. We implemented our approach in the 
Harrsh analyzer and evaluate its effectiveness. In particular, we show that 
Harrsh can decide the entailment problem for data structure definitions 
for which no previous decision procedures have been implemented. 


1 Introduction 


Separation logic (SL) [12,18] is a popular formalism for Hoare-style verification 
of imperative, heap-manipulating programs. In particular, the symbolic heap sep- 
aration logic fragment has received a lot of attention: Symbolic heaps serve as 
the basis of various automated verification tools, such as INFER [6], SLEEK [7], 
SONGBIRD [19], GRASSHOPPER [17], VCDRYAD [16], VERIFAST [13], SLS [20], 
and SPEN [9]. Many of the aforementioned tools rely on systems of inductive pred- 
icate definitions (SID) that serve as specifications of dynamic data structures, 
e.g., linked lists and trees. 

At the heart of every Hoare-style verification procedure based on separation 
logic lies the entailment problem: Given two SL formulas, say y and y, is every 
model of y also a model of y? While the entailment problem is undecidable in 
general [2], there are various approaches to decide entailments between symbolic 
heaps ranging from complete methods for fixed SIDs [3], over decision proce- 
dures for restricted classes of SIDs [10,11], to incomplete approaches, such as 
fold/unfold reasoning [7] or cyclic proofs [5]. 

Among the largest decidable fragments of symbolic heaps with inductive def- 
initions is the fragment of symbolic heaps with bounded tree-width (SLptw) [10]. 
This fragment supports a rich class of data structures in SID specifications, 
such as doubly-linked lists and binary trees with linked leaves. SLptw introduces 
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tree(x1, 22) 
tree(x1, 2) 


zı + (null, null, x2) 
F,r: xı +> (L, r, £2) * tree(é, x1) * tree(r, 71) 


rtree(x1, £2, £3) 
parent(21, £2) 
rtree(zx1, £2, £3) 


r: £1 > (£3, r, £2) * parent(x3, £1) * tree(r, £1) 
1 |> (null, null, z2) 
L, r: xı > (L, r, £2) x rtree(l, x1, 73) * tree(r, xı) 


ws Wi 


ltree(x1, £2, £3 
lltree(11, £2, 23, L4 
lltree(11, £2, 3, 04 
lroot(x1, £2, T3 


p: xı > (null, null, p) « 1ltree(p, £1, £2, £3) 

r: £1 > (£2,r, £3) * tree(r, £1) * lroot(z3, £1, £4) 

r, p: £1 +> (£2,r, p) x lltree(p, 71, £3, £4) * tree(r, xı) 
r: £1 > (£2,r, £3) * tree(r, £1) 


ft ft ft th] ft ft ft | tt tt 


WW Ww Ww Ww 


) 
) 
) 
) 
Fig. 1. An SID @ with three predicates for binary trees with parent pointers. 


three syntactic conditions on SIDs—progress, connectivity, and establishment— 
that enable a reduction from the entailment problem for SLptw to the (decid- 
able) satisfiability problem for monadic second-order logic (MSO) over graphs 
of bounded tree width. This gives rise to a decision procedure of non-elementary 
complexity—at least without an in-depth analysis of the quantifier alternations 
involved in the reduction. The reduction to MSO is also technically involved 
and has—to the best of our knowledge—never been implemented. The authors 
remark in the follow-up paper [11] that “the method from [10] causes a blowup 
of several exponentials in the size of the input problem and is unlikely to produce 
an effective decision procedure.” 


Contributions. We give a new proof for the decidability of the entailment prob- 
lem for the SLptw fragment. In contrast to [10], we circumvent the reduction 
to MSO and give a direct model-theoretic construction with elementary com- 
plexity. This yields an easy-to-implement decision procedure for entailments in 
the full SLitw fragment. We implemented our approach in the Harrsh analyzer 
and report on promising results for challenging examples (Sect.6). In particu- 
lar, we show that Harrsh can decide the entailment problem for data structure 
definitions for which no previous decision procedures have been implemented. 


A challenging example. To highlight the challenges faced when developing and 
implementing decision procedures for entailments in SLptiw, consider the SID & 
consisting of the rules in Fig. 1.! There are three predicates, namely tree, rtree, 
and ltree, that specify binary trees with parent pointers (treep for short). The 
predicate tree takes two parameters representing the root of the tree and its 
parent. Predicates rtree and ltree both have the leftmost leaf of the tree as 
an additional parameter. Such a parameter may, for example, be required to 
specify tree segments for an automated program analysis. Although both rtree 
and ltree describe treeps, they take radically different approaches: Predicate 
rtree defines a treep starting at the root, i.e., it specifies the root of the treep 


1 The syntax and semantics of SIDs are defined formally in Sect. 3. 
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and then states that both subtrees are treeps (the param- 

eter representing the leftmost leaf is additionally passed 

to the left subtree). In contrast, predicate ltree speci- 

fies treeps starting at the leftmost leaf and moving up 

to the root. Consequently, the models of these predicates G 
are derived in completely different ways, which is a chal- 

lenge for commonly applied approaches, such as fold/unfold Fig. 2. treep 
(cf. [7]) or inductive reasoning (cf. [5,19,20]). In fact, the 

entailment 1tree(x1, £2, £3) = rtree(x2, 23,21) holds, whereas the entailment 
rtree(r2, £3, £1) = ltree(z1, £2, x3) is violated: Intuitively, rtree admits mod- 
els in which all shortest paths from the root to the leftmost leaf have length one. 
In contrast, for 1tree, the minimal length of all shortest paths is two. Thus, the 
heap illustrated in Fig. 2 is a model of rtree, but not of 1tree. In fact, if we 
rule out this model, rtree and ltree entail each other. That is, the entailment 
below and its converse are both valid: 


ltree(x1, £2, £3) H| I, r: z2 > (l, r, x3) x rtree(l, £2, £1) x tree(r, x2) (de) 


HARRSH solved the entailment (œ) from above in less than a second. The only 
other tool capable of successfully solving (#) is SLIDE [11], which is based on 
tree automata. However, the approach in [11] is not complete for SLptw- 


Overview of our approach. We first present an algebra à la Courcelle [8] to sys- 
tematically construct models of separation logic formulas (Sect. 2). This algebra 
enables us to conveniently formalize the semantics of separation logic (Sect. 3). 
To decide entailments, we then develop an abstraction mechanism for models 
with the following properties (Sect. 4): 


1. The abstraction is compositional, i.e., we can perform our algebraic operations 
on abstractions instead of models (Theorem 2). 

2. The abstraction is finite, i.e., each model of a predicate is abstracted to one 
of finitely many abstractions (Lemma 3). 

3. The abstraction refines the predicate satisfaction relation, i.e., models with 
the same abstraction entail the same predicates among those relevant for the 
entailment (Lemma 2). 

4. The abstraction is effective, i.e., for a given abstraction, one can determine 
which predicates are entailed (Theorem 3). 


How do we obtain a decision procedure from these properties for an entail- 
ment, say pred,(x1) Hø pred.(x2)? We iteratively compute all abstractions 
corresponding to models of pred,(x1). Due to compositionality (1), this can be 
achieved by applying the same operations used to construct models on previously 
computed abstractions until a fixed point is reached. Finiteness of the abstrac- 
tion (2) ensures termination. We then exploit that the abstraction is well-defined 
(3) and effective (4) to decide the entailment: pred,(x1) Fe pred (x2) holds iff 
all computed abstractions of models of pred, (x1) yield that they are also models 
of pred, (x2) (Sect. 5). 
Due to space constraints, all proofs are in the supplementary material [1]. 
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Notation. The set of all (non-empty) finite sequences over a set S is S* (ST). 


Bold letters denote sequences, e.g., X = (%1,...,U). Xfi] refers to the i-th ele- 
ment of x. We often treat sequences as sets, i.e. we write y € x if y occurs in x, 
x Uz for the set of all elements in x or z, etc. f = {£1 > Y1,- .-, En | Yn} is the 


function given by f(#;) = yi for i € [1, n], n > 0. Moreover, functions f: X — Y 
are lifted to functions on sequences f: X* — Y* by pointwise application. 


@-O-O-O+O-B=H-®O 


Fig. 3. A heap graph modeling a list segment of length at least 5 from zı to 22. 


2 Heap Graphs 


Separation logic is typically interpreted in terms of stack-heap pairs consisting 
of a stack, i.e., an evaluation of variables, and a heap, i.e., a finite mapping from 
memory locations to values. In our setting, however, it is more convenient to 
abstract from locations and consider labeled graphs. 

Formally, let Var be a set of variables containing a special variable null € 
Var. Moreover, let Preds be a set of predicate identifiers; each predicate pred € 
Preds is equipped with an arity ar(pred) € N. pred(x) is a predicate call if the 
length of sequence x € Var“ is ar(pred). 


Definition 1 (Heap Graph). A heap graph M = (Ptr, FV, calls) is a graph 
whose nodes are a finite set of variables in Var. The edges of M are given by a 
partial points-to function Ptr: Var \ {null} — finite Vart mapping variables to 
finite tuples of variables. Moreover, FV C Var is a finite set of free variables 
and calls is a finite set of predicate calls. A heap graph is concrete if calls = 0. 
We collect all variables in Ptr, FV, and calls in vars(M). Finally, we write Ptr m, 
FV jy, and calls,, to refer to the individual components of heap graph M. A 


Example 1. Figure3 depicts a heap graph modeling a singly-linked list of length 
at least five with head x, and tail rə (assuming the predicate call s11(d, x2) 
stands for non-empty lists segments from d to x2; see the left part of Fig. 5). 
In our graphical notation, every node corresponds to the variable it is labeled 
with. Gray nodes correspond to the free variables in FV. For each variable, say 
x, the pointers Ptr(x) = (y1, ..., Yk) are represented by directed edges—labeled 
with the position 1, 2,...,k—from the node labeled with x to nodes labeled with 
Y1,--+,Yk, respectively. We usually omit the edge labels if each node has at most 
one outgoing edge. Finally, a predicate call is drawn as a box labeled with the 
predicate call and connected to the nodes representing the variables occurring 
in the call’s parameters. Formally, the heap graph in Fig.3 is given by M = 
(Ptr, FV, calls) with points-to mapping Ptr = {z1 +> a,at> b,b > c, c > d}, free 
variables FV = {21,22} and predicate calls calls = {s11(d, x2)}. A 
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Oo EE ZO: DE ZOE ZOR 


Fig. 4. Illustration of composition of two heap graphs. 


Heap graphs are an abstraction of the classical stack-heap model. To reason 
about separation logic with heap graphs (and their abstractions), we need a few 
operations for their systematic construction: Let f: War — Var be a partial 
function and f(M) its application to every variable in every component of M. 


Isomorphic heap graphs. We call a variable x € Var an auziliary variable of 
heap graph M if x is not a free variable of M. Throughout this article, we 
do not distinguish between isomorphic heap graphs, i.e., heap graphs that are 
identical up to renaming of auxiliary variables. Formally, two heap graphs Mı 
and Mg are isomorphic, written Mı = Ma, if there exists a bijective function 
f: vars(My,) — vars( M2) such that (1) FVYm, = FVm,, (2) f(x) = zx for all 
LE FV mı, and (3) f(Mı) = Mə. 


Renaming heap graphs. Our first operation enables renaming of free variables. 
Formally, let M be a heap graph and x € FV4,, y € Var" be repetition free 
sequences of variables of the same length. Then the renaming of x to y in M is 
given by renamex y(M) = f(M), where 


f: Var — Var, zr a a 

z otherwise. 
Composition. Our next operation allows composing heap graphs by “gluing” 
them together at their common free variables. Formally, let M1, M2 be heap 
graphs such that (1) vars( M1) N vars(M2) C FVm, N FV, and (2) Ptrm, 
and Ptrm, are domain disjoint, i.e., dom(Ptr m, ) O dom(Ptrm,) = Ø. Then the 
componentwise union Mı U M2 of Mı and Mə is (Ptrm, U Ptrm,, FVm, U 
FV ma, callsm, Ucallsm,). Otherwise, M1 U Moa is undefined. We then define the 
composition Mı e Mə of heap graphs M1, Mə as 


M,UM where M & Mo and Mı UM is defined 
undefined otherwise. 


miem] 


Example 2. Figure 4 depicts the composition of two heap graphs representing 
lists of length two. Since both heap graphs share a variable a ¢ FV, we first 
compute an isomorphic heap graph in which variable a is substituted by c in the 
second graph. Both heap graphs are then merged at their common free variable 
b. This results in a heap graph modeling a list of length four. A 
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Forgetting free variables. To construct larger heap graphs from smaller ones, 
we often need additional free variables to glue the right nodes together, e.g., 
the variable b in Example 2. Consequently, we need a mechanism for subsequent 
removal of these variables from the set of free variables. To this end, for every 
heap graph M and sequence of free variables x € FV\,, we define the operation 
forget (M) = (Ptr m, FV m \ x, calls m). 


Single allocations. The simplest non-empty heap graph is a single variable, say 
x with pointers to a sequence y of finitely many other variables. We write x — y 
to denote this single-allocation heap graph ({x+> y}, {x£} U y, 0). 


Theorem 1 ([8]). Every non-empty heap graph of tree width at most k can 
be constructed from heap graphs x — y, renaming, composition, and forgetting 
using at most k + 1 free variables. 


3 Symbolic Heap Separation Logic 


We consider the symbolic heap fragment of separation logic with user-defined 
inductive predicate definitions. We omit pure formulas to simplify the presenta- 
tion. Notice, however, that our implementation supports reasoning about sym- 
bolic heaps with pure formulas. 


Syntaz. The syntax of our simplified symbolic heap fragment is then given by 
the following context-free grammar: 


p::=emp | x> y | pred(y) | Se: p | p * p, 


where z € Var \ {null} is a variable, y € Vart is a sequence of variables, 
and pred(y) is a predicate call. Here, emp is the empty heap, x +> y asserts 
that x points-to the locations captured by y, Jx: p is existential quantification, 
and x» is the separating conjunction. Because * is commutative and associative 
and because existential quantifiers can always be moved to the front, we will 
always consider symbolic heaps to be of form dy: (a4, > y1) * +++ * (fm > 


Ym) * pred, (Z1) *--- * pred, (Zn). 


Inductive definitions. Before we assign formal semantics to symbolic heaps, we 
clarify how custom predicates are specified. To this end, a system of inductive 
definitions (SID) is a finite set ® of rules of the form pred <= y, where pred € 
Preds is a predicate symbol and y is a symbolic heap. We assume that all 
symbolic heaps of rules with head pred have the same sequence of free variables 
(a1,...,Lar(pred))” and collect these variables in the set fv(pred). Moreover, we 
collect all predicates that occur in SID @ in the set Preds(®) and all rules of 
SID @ in the set Rules(®). Examples of SIDs are found in Figs. 1 and 5. 


? A variable is in the set fv(y) of free variables of y if it is not bound by a quantifier. 
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Semantics. We define the semantics of symbolic heaps p for a given SID & 
in terms of a force relation Fo, which determines whether a heap graph M 
satisfies ọ. To this end, let y[x/y] denote the symbolic heap y in which every free 
occurrence of variable x[i] is substituted by variable y[i], where 1 < i < |x| = ly]. 
Then the relation Fe is defined inductively on the syntax of symbolic heaps: 


M Eo emp iff ex. x € Var* s.t. M = (0,x,0) 
M Fo «ry iff ex. z D {z}Uy s.t. M = {x |> y},z,0) 
M Keo pred(y) iff ex. z D y s.t. M & (0,2, {pred(y)}) 
or ex. (pred = y) € Rules(®) s.t. M Hs ylfv(pred)/y] 
M 
M 


E@ Jz: y iff ex. y € Var s.t. (Ptrm, FVm U {y}, callsm) Eo yle/y] 
Fe yı * po iff ex. M1, Mə s.t. M = Me Mo 
and Mı Ke yı and Mz Fe ye 


The above semantics coincides with the standard least fixed-point semantics of 
symbolic heaps (cf. [4]) for stack-heap pairs if we restrict ourselves to concrete 
heap graphs. Moreover, there is a strong relationship between our SL semantics 
and the operations on heap graphs defined in Sect. 2. 

Lemma 1. Let yp = Jy: (£1 > y1) *: -x (Lm = Ym) * pred, (Z1) *- ++ * pred, (Zn) 
be a symbolic heap. M Fe p iff there exist M1,...,Mm+4n such that (1) 
Mi Fo re yi forl<i<m, (2) Mm+j Fe pred, (fv(pred;)) for 1 <j <n, 
and (3) M = forgety(M1 ¢ --- e Mm © renamefy(pred,),2;(Mmt4i) ° ++: © 
renaMery (pred, ),zn (Mm+n))- 


Symbolic heaps with bounded tree-width. Our goal is to develop a decision pro- 
cedure for symbolic heaps with inductive definitions in the bounded tree-width 
fragment developed by Iosif et al. [10]. This fragment imposes three conditions 
on SIDs, which we assume for all SIDs ® considered in the following: 


1. Progress: Every rule allocates exactly one variable x, i.e. every rule contains 
exactly one points-to assertion £ > y. 

2. Connectivity: Every predicate call pred(z) of a rule has a parameter z[é] that 
is referenced by the rule’s allocated variable, i.e., z[i] € y. Moreover, the i-th 
free variable of predicate pred must be allocated in all rules pred < vy of @. 

3. Establishment: All existentially quantified variables are eventually allocated. 


Assumptions. We make two further assumptions for all SIDs throughout this 
paper: (1) Predicates are called with pairwise different parameters. (2) Unfold- 
ing predicates (iteratively substituting predicate calls pred(y) with the right-hand 
sides y|fv(pred)/y] of rules pred <= y) always yields satisfiable symbolic heaps. 
SIDs can be transformed automatically to satisfy (1) and (2) before applying our 
decision procedure (cf. [1,14]). The SIDs in Figs. 1 and 5 satisfy all assumptions. 
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4 Profiles: An Abstraction for Concrete Heap Graphs 


Entailment problem. We present our approach for entailments pred,(x) Eo 
pred.(y) between predicate calls pred, (x), and pred,(y) of an SID &. We discuss 
the treatment of more general entailments at the end of Sect.5. Formally, the 
entailment pred; (x) Fe pred.(y) holds iff for all concrete heap graphs M, we 
have M ge pred,(x) implies M s pred,(y). 


Model reconstruction. Recall from Lemma 1 that M |s pred,(x) can be inter- 
preted as being able to construct M as a model of pred, (x) using the rules of 
SID @ and our operations on heap graphs introduced in Sect.2. To prove the 
entailment pred,(x) Fe predj(y), we then have to “reconstruct” any such M 
as a model of pred,(y). Since infinitely many model reconstructions might be 
required—after all there might be infinitely many M with M |=ẹ pred, (x)— 
we now develop an abstraction of heap graphs such that finitely many abstract 
model reconstructions suffice to cover all models of pred, (x). 


Running example. To sharpen our intuition, we present the technical details 
of our abstraction together with a running example: Fig.5 shows an SID 
Piists Specifying predicates for various singly-linked list segments. The predi- 
cate s11(a1,22) specifies non-empty singly-linked list segments with head zı 
and tail z2. Similarly, the predicates odd(x1, £2) and even(z1, 22) restrict such 
list segments to odd and even length, respectively. In the remainder of this 
and the next section, we will use our abstraction to show that the entailment 
s1l(x1, £2) Fe,.,, odd(v1, £2) does not hold. 


sll(x1, £2) = 1H T2 odd(£1, £2) <= x1 > £2 


tt 


sll(ti,v2) < dy: rH y * s11 (y, x2) odd(x1, £2) yi zı +> y * even(y, £2) 


even(z1, £2) < 


yY: zı > y * odd(y, x2) 


Fig. 5. SIDs sı (left) and s/e (right) specifying singly-linked list segments with head 
xı and tail x2. Moreover, we define Piists = Psu U Boje- 


4.1 Context Profiles as an Abstract Domain 


Contexts. Our proposed abstraction is based on contests. Intuitively, every con- 
text describes an extension of a concrete heap graph by predicate calls such that 
the resulting graph satisfies a fixed predicate call. Thus, contexts reveal what is 
missing in a concrete heap graph to reconstruct models of predicate calls. 


Definition 2 (Context). A triple C = (V, pred(x), calls) is a context of a con- 
crete heap graph M w.r.t. SID ® if (1) V = FVm, (2) (Ptrm,x, calls) He 
pred(x), and (8) neither x nor calls contain auxiliary variables of M. Moreover, 
we define the set of free variables of context C as fv(C) := V. We call variables 
in x or calls, but not in fv(C), the auxiliary variables of C. A 
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Example 3. Figure6 shows contexts for two concrete heap graphs Moaq and 
Meven of odd and even length (without dashes), respectively. The extension by 
calls from the contexts is illustrated by dashed lines. Intuitively, context Cı states 
that no extension of Moaq is needed to obtain a model of predicate odd(x1, £2). 
Context Cə states that—in order to obtain an odd list segment from zı to a, 
where a is an additional free variable—we have to add an even list segment from 
x2 to a. Similarly, we obtain an even list segment from x; to some fresh variable 
a by adding an odd list segment from x2 to a. The interpretation of contexts C4, 
C5, and Cg of Meven is analogous. A 


Contexts decompositions. A context of heap graph M stores the free variables of 
M. These variables are important, because additional free variables might allow 
to split a heap graph into several smaller ones. For example, the additional free 
variable b in Fig.4 (read from right to left) allows to decompose a list into 
two lists. Since our goal is to develop a compositional abstraction, we have to 
take contexts of decompositions of heap graphs into account. In general, these 
decompositions are relevant for entailment when considering more complicated 
SIDs, e.g., doubly-linked binary trees or trees with linked leaves. We thus have 
to compute decompositions Mı è ... e Mk, k > 1, of a concrete heap graph M 
and then consider a context for each component. 


Definition 3 (Context decomposition). A context decomposition of a con- 
crete heap graph M w.r.t. SID ® is a set E = {Cy,...,Ce} such that M = 
Mı è... e Mz, k > 1, is a decomposition of M and C,,...,Cy are contexts of 
the concrete heap graphs M,,..., Mx w.r.t. P, respectively. Moreover, we define 
the set of free variables of context decomposition E as fv(E) := Uceg fv(C). A 


Ce ee @MHCHCHE@) 

Ca = (e1, 22), odale: a), {even(er,0)h) ED (= HE eventez 0) Ca) 
Ca = (1,22), eveals, a), foaaten,a))) EDC HE HE ‘oaae, a) a 
Galiana eee GHC He) 

Cs = ({1, 22}, even(z1,a), {even(x2,a)}) OSORO) foven(22,a) O 

Co = rea} oals, a), (oda(ea,a)}) (EDC HE) otata, a) ia) 


Fig. 6. Contexts of concrete heap graphs Mgaa (first graph) and Meven (fourth graph). 
The extensions by a context are drawn in dashed lines. 


Example 4. The concrete heap graph Moaq in Fig. 6 cannot be decomposed into 
smaller graphs due to a lack of free variables. Hence, context decompositions of 
Moaa are singletons consisting of C1, C2, and C3 in Fig. 6, respectively. A 
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Profiles. As the above example shows, concrete heap graphs may have multiple 
context decompositions. We thus abstract a concrete heap graph M by the set 
of all context decompositions of M: 


Definition 4 (Profiles). The profile profiles(M) of a concrete heap graph M 
w.r.t. SID @ is the set of all context decompositions of M w.r.t. B. Moreover, 
since all E € P have the same free variables, we define the free variables of P as 
fv(P) := f(E) for some E € P. A 


Refinement property. We propose profiles as a suitable abstraction for deciding 
entailments. We will argue that they comply with the four essential correct- 
ness properties discussed in Sect. 1: refinement, finiteness, compositionality, and 
effectiveness. Refinement means that two concrete heap graphs with the same 
profiles entail the same SID predicates. Hence, for each profile and predicate 
pred, it suffices to find a single model of pred with that profile. Formally, 


Lemma 2. Let M,M’ be concrete heap graphs with profileg(M) = profiles 
(M’'). Then, for all pred € Preds(®), we have M Fe pred(x) iff M! Ko 
pred(x). 


Finiteness. In general, the set of profiles of concrete heap graphs is infinite 
due to different names for additional free variables, e.g., variable a in Fig. 6. 
To obtain a finite set of profiles, we thus (a) limit the total number of free 
variables, (b) consider profiles up to renaming of additional free variables, and (c) 
exploit the connectivity condition. Notice that condition (a) is not a restriction, 
because the number of free variables for every SID and thus every entailment 
query is bounded. For condition (b), we have to lift the notion of isomorphism 
from heap graphs to profiles. Formally, contexts Cı = (z1, pred, (x1), calls;) and 
C2 = (Z2, preda (x2), callsg) are isomorphic iff zı = Z2, pred; = pred. and there 
exists a bijective function f: War — Var such that (1) for all z € z1, f(z) = z, 
(2) f(x1) = x2, and (3) calls2 = {pred(f(y)) | pred(y) € callsı }. Moreover, two 
context decompositions E1, E2 are isomorphic iff for all i € {1,2} and contexts 
C € £E; there is a context C’ € €3_; that is isomorphic to C. Analogously, two 
profiles P1, P2 are isomorphic iff for all i € {1,2} and context decompositions 
E € P; there exists a context decomposition E’ € P3_; that is isomorphic to &;. 

Throughout this paper, we do not distinguish between isomorphic contexts, 
context decompositions, or profiles. 


Lemma 3. For every SID ® and variable sequence x € Var*, the set of profiles 
Profiles*(®) = {profileg(M) | M concrete heap graph, fv(profiles(M)) C x} is 
finite up to profile isomorphism. 


Example 5. Recall from Fig.5 the SID s/e. Moreover, recall from Fig.6 the 
concrete heap graphs Moga and Meven and their contexts C1, C2, C3 and Ca, 
Cs, Ce, respectively. Then the profiles of Moaa and Meven w.r-t. Poje are (up to 
isomorphism) profiles,,(Moaa) = {{Ci}, {C2}, {C2}} and profiles, (Meven) = 
{{C4}, {Cs}, {Co }}. In fact, the profile of every singly-linked list segment from zı 


Effective Entailment Checking for Separation Logic 329 


to x2 of odd (even) length is isomorphic to profiles, ,,(Moaa) (profiles, , (Meven)). 
Hence, the profile of every model of the singly-linked list predicate s11(21, x2) 
is either profiles, (Moda) or profiles, , (Meven). A 


4.2 Computation of Profiles 


Due to Lemmas 2 and 3, we can decide an entailment pred,(x) Fee preds(x), 
once the profiles of all models of pred,(x) with respect to the rules relevant 
for pred,(x) are known. The key insight underlying our entailment checker is 
that profiles can be computed automatically in a compositional manner. To this 
end, recall from Theorem 1 that every concrete heap graph can be constructed 
from single-allocation heap graphs x — y by means of renaming, forgetting, 
and composition. We exploit this by (1) devising an algorithm to compute 
profiles(x => y) and (2) lifting the operations renamex,y, forget,, and e for 
renaming, forgetting, and composition of heap graphs to operations renamex.y, 
forget, and @ on profiles. 


x) 


Profiles of single allocations. Since single allocations x — y cannot be further 
decomposed, every context decomposition of x — y w.r.t. an SID @ is a single- 
ton. Due to the progress condition, every rule of ® contains exactly one points-to 
assertion. For each SID rule pred <= 3z: 2’ +> y’ x pred, (yi) *--- * pred; (yx), 
the corresponding context ({x’} Uy’, pred(x), {pred,(y1),...,pred,(yx)}) must 
be in the profile of x — y iff x > y is a model of 3z: x’ + y’. Hence: 


Lemma 4. Profiles of single allocations, i.e., profileg(x — y), are computable. 


Rename for profiles. We lift the operation renamex,, which renames each vari- 
able in x to the corresponding variable in y according to their position, from 
heap graphs to contexts, context decompositions, and profiles by componentwise 
application. That is, for a context C = (z, pred(u), calls), a context decomposition 
€, and a profile P, we define: 


renamex y(C) := (rename, y(z), pred(rename, y(u)), 

{ pred’ (renamex,y(v) | pred’(v) € calls}) 
renamex y(E€) := {renamex y(C) | C € E} 
renaméx y(P) :={renamex y(E) | E € P} 


Forget for profiles. Next, we lift the operation forget, which removes variables in 
x from the set of free variables, to contexts, context decompositions, and profiles. 
For a profile, forgetting a free variable means that some of its constituting context 
decompositions do not have to be considered anymore, because the composition 
of their underlying models is no longer defined. Hence, these decompositions are 
removed. Formally, for a context C = (z, pred(u), calls), a context decomposition 
€, and a profile P, we define: 
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forget (C) := (z \ x, pred(u), calls) forget (E) := {forget (C) |C E E} 
forget (P) := {forget (£) | E € P and xN usedvs(E) = 0} 


usedvs(£) := U usedvs(C) usedvs(C) := u U U y 
CEE pred’ (y)€calls 


Composition for profiles. It remains to lift heap graph composition to profiles. 
This is formalized as substituting predicate calls of contexts by other contexts: 


Definition 5 (Context substitution). Let Cı = (xı, pred; (zı), callsı}) and 
C2 = (X2, preda (Z2), calls2) be contexts such that (1) pred; (z1) € calls2 and (2) 
no auxiliary variable of Cz is a free variable of Cı and vice versa. Then the 
substitution of pred} (Zz) in C2 by Cy is given by 


Cə [C1] := (x1 U X2, preda (Z2), (calls2 \ {pred; (z1) }) U callsı). A 


To compose profiles, we attempt to substitute the underlying contexts with each 
other in all possible ways. Formally, a context decomposition E1 derives a context 
decomposition E2, written E1 > E2, iff there exist contexts C1, C2 € E1 such that 
Ez = (E1 \ {C1, C2}) U {C2 [C1]}.2 We denote by >* the reflexive-transitive closure 
of the derivation relation >. The composition of two profiles then consists of all 
context decompositions derivable from some decompositions of both profiles: 


Definition 6 (Composition of profiles). Let Pı and P2 be profiles w.r.t. ®. 
Then the composition Pı ® Pa of Pı and Pa is defined as 


Pı © Po := {E | FE, € Pi, E2 E€ Po: E1 U E2D* E}. A 


Compositionality. Our lifted heap graph operations satisfy the compositionality 
property mentioned in Sect. 1. That is, 


Theorem 2. For all concrete heap graphs M, M’ and every SID ©, we have 


rename, y(profileg(M)) = profileg(renamex y(M)) 
forget,,.(profiles(M)) = profiles (forget,.(M)) 
profiles(M) æ profileg(M’) = profileg(M è M’) 


provided that renamex,y(M), forget,(M), and M eM’ are defined, respectively. 


Example 6. Recall from Fig. 6 the heap graphs Moaq and Meven whose profiles 
w.r.t. Poje capture all singly-linked lists. We can construct a concrete heap graph 
M representing a list of length five from x; to x2 by computing 


M := rename, zs (forgets, (Moda ° rename, z2), (2,0) (Meven))) 3 


3 Recall that all definitions are to be read up to isomorphism, i.e., auxiliary variables 
of C1, C2, and E2 may be renamed prior to the substitution. 
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Then, by Theorem 2, the corresponding profile profileg, „(M ) is given by: 


rename, sa(forget,,( profiles, (Moaa) S TENAME (2, 22), (22,0) (profiles, (Meven)) } 
This profile, in turn, coincides with the profile of Moqaaq, i.e., we have 
profiles, „ (M) = profiles, ,(Moaa)- 


In particular, notice that without the forget statement, we would obtain a heap 
graph M’ with an additional free variable. The additional free variable would 
also influence the profile of M’, because there exist more decompositions of 
M’ into heap graphs Mı e Mə. Consequently, there are also more context 
decompositions of M’ and thus M’ has a larger profile. A 


5 An Effective Decision Procedure for Entailment 


Profile analysis. We now exploit our abstract domain to develop a decision 
procedure for entailments of the form pred,(a) Hø pred,(b). Let us first consider 
the case in which the parameters a and b coincide with the free variables in the 
rules of the SID, i.e., a = fv(pred,) =: x1 and b = fv(pred.) =: xg. Our key 
observation is then that analyzing profiles of the entailment’s left-hand side 
suffices to discharge it: The entailment pred,(x1) Fee pred.(x2) holds iff the 
profile of every model M of pred, (x1) contains a context decomposition stating 
that a model of pred(x2) can be reconstructed from M. Formally, 


Theorem 3. The entailment pred,(x1) Ee predo(x2) holds iff for all concrete 
heap graphs M with M | pred; (x1), {(FV m, preds(x2),0)} € profiles(M). 


Example 7. Recall the profiles profiles, je (Meven) and profiles, de (Moaa) from 
Example 5 computed for models of s11(#1, £2) w.r.t. SID Soje (Fig. 5). We now 
use these profiles to disprove the entailment s11(21, £2) Hð; odd(x1, £2): First, 
observe that all predicates relevant for constructing models of odd(x, £2) belong 
to Boje E Pists- Second, the profiles, (Meven) does not contain a context decom- 
position {({x1, £2}, odd(x1,x2),0)}. Hence, by Theorem 3, the entailment does 
not hold as we cannot reconstruct Meven as a model of predicate odd(x 1,22). A 


Computing profiles. By Theorem 3, to decide whether pred,(x1) =ø pred. (x2) 
holds, it suffices to compute the finite (by Lemma 3) set of all profiles of mod- 
els of pred; (x1). This is performed by the procedure abstractSID(®) shown in 
Algorithm 1. To understand how the algorithm works, recall how predicates can 
be unrolled to compute a model: We select an SID rule and replace all of its 
predicate calls with previously computed models. By Lemma 1, this amounts to 
performing heap graph operations. That is, we first rename the free variables of 
previously computed models to match the parameters of predicate calls. After 
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Algorithm 1: The algorithm abstractSID(®) computes a function f 
that maps each predicate pred € Preds(®) to the set of profiles 
{profiles(M) | M Hø pred(fv(pred)) }. 

feurr := Apred . Ø; 

repeat 


1 
2 
3 Jore = feurr3 

4 for pred € Preds(#) do 

5 for (pred < Jy: x +> Zo * pred, (Z1) *--- * pred, (Zk)) E€ Rules(&) do 
6 

7 

8 

9 


Po := profiles(x — Zo); 
for Fi € fprev(pred,),..-,Fk E€ fprev(pred,) do 
for i € {1,...,k} do 
| Pi := TENAME | (pred, ),z; (Fi); 
10 P := forget, (Po ePie.:-:- è Px); 
11 feurr (pred) := feurr (pred) U {P}; 


12 until Jeurr = Joren 
13 return fourr 


that, the resulting models and the single allocation (due to the progress con- 
dition) of the rule are composed into a single heap graph. Finally, we apply a 
forget operation to remove free variables that have been existentially quantified. 

Algorithm 1 behaves analogously. However, instead of applying operations on 
heap graphs, it applies our abstract operations on profiles (cf. Theorem 2): We 
select an SID rule pred <+ vy in line 5. By Lemma 4, we can compute the profile 
of the single allocation in y. (1. 6). We then select previously computed profiles 
for the predicate rules and rename their free variables to match the parameters 
of the predicate calls in ọ (l. 7-9). Finally, the selected profiles are composed 
and added to the computed profiles of predicate pred (1. 10, 11). The algorithm 
then proceeds by computing profiles until a fixed point is reached (l. 12). 


Correctness. Algorithm 1 is guaranteed to terminate due to the finiteness of our 
abstract domain (Lemma 3). Moreover, it computes the desired set of profiles: 


Theorem 4. abstractSID(®) (pred) = {profiles(M) | M Ee pred(fv(pred)) and 
FV. C fv(pred)}. 


To check entailments pred,(a) ø pred,(b), where a and b do not coincide 
with the free variables of pred, and pred, in the rules of ®, it suffices to apply 
an additional rename operation. Hence, by combining Theorems3 and 4, we 
obtain a constructive decidability proof for entailments between predicate calls. 
Moreover, a close inspection of the size of the set of profiles and the runtime of 
Algorithm 1 reveals that our decision procedure runs in time doubly exponential 
in the size of a given SID. A detailed analysis is found in [1, Sect. 7.4]. 


Effective Entailment Checking for Separation Logic 333 


Corollary 1. It is decidable in doubly exponential time whether the entailment 
pred, (a) Ee pred.(b) holds. 


Generalizations. Several of our assumptions about SIDs and entailments have 
been made purely to simplify the presentation. In fact, Corollary 1 can be gen- 
eralized to (1) decide entailments y Fe w for symbolic heaps y, w (instead of 
predicate calls) and (2) SIDs with pure formulas. Both extensions are supported 
by our implementation. Further details are found in [1]. 


6 Experiments 


We implemented our decision procedure for entailment in the separation logic 
prover HARRSH [1,15], which is written in Scala. HARRSH supports the full 
SLytw fragment, including pure formulas, parameter repetitions, and entailments 
between symbolic heaps (as opposed to single predicate calls). Table 1 summa- 
rizes the results of our evaluation for a selection of entailments and SIDs. Our full 
collection of 101 benchmarks and all experimental results are available online [1]. 


Methodology. We compared HARRSH against SONGBIRD [19], the winner of 
the SID entailment category of this year’s separation logic competition, SL- 
COMP’18; and against SLIDE [11], the tool that is most closely related to our 
approach but that is complete only for a subclass of SLptw. Experiments were 
conducted using the popular benchmarking harness JMH on an Intel(R) Core™ 
i7-7500U CPU running at 2.70 GHz with a memory limit of 4GB. We report the 
average run times obtained by running JMH on each benchmark for 100s. 


Benchmarks. Besides the running example (with s11, Co n: 
even and odd as in Fig. 5) and the entailments for doubly- Smm ) H 
linked trees discussed in the introduction (with 1tree, = 
rtree as defined in Fig. 1), we show results on standard 
data-structure specifications from the SL literature: Sev- 
eral variants of trees with linked leaves (t11 [10], at11, t11”) and doubly-linked 
lists (dl1ht [18] defining lists from head to tail, dllth from tail to head). Beyond 
lists and trees, we checked an entailment between doubly-linked 2-grid segments 
(see Fig. 7) defined forwards dlgridr and backwards dlgridl.* 


Fig. 7. digrid 


Size of the abstraction. Beside the run times, we report the size of the abstrac- 
tion computed by HARRSH. More specifically, we report (1) the total number of 
profiles in the fixed point of abstractSID (#P), (2) the total number of context 
decompositions across all profiles (#D), and (3) the total number of contexts 
across all decompositions of all profiles (#4C). This shows that even though the 
abstract domain Profiles*(®) is very large in general, HARRSH typically only 
needs to explore a small portion of it to decide an entailment. 


t Formal definitions of all SIDs are found in the supplementary material [1]. 
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Table 1. The performance of HARRSH (HRS), SONGBIRD (SB) and SLIDE (SLD) on 
a variety of SIDs; and the size of the abstraction computed by HARRSH. The timeout 
(TO) was 180,000 ms. Termination before the timeout but without result is denoted 
(U). Wrong results/crashes are marked (X). 


Benchmark Time (ms) Profiles 
Query Status HRS SB SLD #P #D #C 
sll(z1, £2) = odd(x1, x2) false 4 11 43 2 6 6 
even(x1, £2) = sll(a1,22) true 2 26 48 2 4 4 
rtree(z1, £2, £3) = ltree(x1, 2, £3) false 16 (U) 53 3 14 21 
Entailment (#) (Sect. 1), left to right true 393 TO 53 7 70116 
Entailment (#) (Sect. 1), right to left true 532 1274 54 9 57 87 
atll(z1, 02,03) H t11(x1, £2, £3) true 98519 TO 2 2 2 
tll(z1, £2, £3) = atll(z1, £2, £3) false 2 119 TO 2 1 1 
t11” (£1, £2, £3) H tll(x1, £2, £3) true 2 34 (X) 3 3 4 
dllht(z1, £2, £3, £4) = dllth(z3, £4, £1, £2) true 16 37 50 3 27 45 
dllth(z1, £2, £3, £4) = dllht(z3, £4, £1, £2) true 16 37 50 3 27 45 
dlgridr(z1,..., £s) | dlgridl(xı,..., £8) true 172 TO (X) 5 87 208 


Results. Table 1 reveals that our decision procedure—being the first implemented 
decision procedure that is complete for the entire SL fragment SLptw—is not only 
of theoretical interest, but can also solve challenging entailment problems effi- 
ciently in practice. While SLIDE was faster on some benchmarks that fall into 
the fragment defined in [11], as well as on some SIDs outside of that fragment, 
HARRSH was able to solve several benchmarks on which SLIDE failed. Two bench- 
marks led to errors: One wrong result and one program crash (the first and the 
second entries marked by (X) in Table 1, respectively). We are unsure whether 
the timeouts encountered on the TLL benchmarks are caused by a bug in SLIDE, 
as SLIDE is quite efficient on other TLL variants (see [11, Table 1]). Furthermore, 
note that HARRSH significantly outperformed SONGBIRD, providing further evi- 
dence of the effectiveness of our profile-based abstraction. 


7 Conclusion 


We presented an alternative proof for decidability of entailment in separation 
logic with bounded tree width [10]. In contrast to the original proof, we give 
a direct model theoretic construction. We implemented the resulting decision 
procedure in the tool HARRSH and obtained promising experimental results. For 
future work, we plan to extend our approach to the bi-abduction problem. 
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Abstract. Digital bifurcation analysis is a new algorithmic method for 
exploring how the behaviour of a parameter-dependent computer system 
varies with a change in its parameters and, in particular, for identification 
of bifurcation points where such variation becomes dramatic. We have 
developed the method in an analogy with the traditional bifurcation the- 
ory and have it successfully applied to models taken from systems biol- 
ogy. In this case study paper, we demonstrate the appropriateness and 
usefulness of the digital bifurcation analysis as a push-button alternative 
to the classical approaches as traditionally used for analysing the stabil- 
ity of TCP/IP protocols. We consider two typical examples (congestion 
control and buffer sizes throughput influence) and show that the method 
provides the same results as obtained with classical non-automatic ana- 
lytical and numerical methods. 


1 Introduction 


The objective of the bifurcation theory is to study qualitative changes to the 
properties of a parameter-dependent system as parameters are varied. The 
method is typically applied to continuous-time or discrete-time dynamical sys- 
tems. Even a tiny change in parameters may cause a dynamical system to exhibit 
entirely different qualitative features. Such dramatic changes in the topology of 
the phase space of a dynamical system are known as bifurcations, and the values 
of the parameters for which a bifurcation occurs are called bifurcation points. 
For a complete global understanding of a complex dynamical system, it is essen- 
tial to know the bifurcation points, as well as the parameter ranges in which 
there is no fundamental change. A simple example of a real-life bifurcation is 
the phase transition of water to ice at the temperature of 0°C. At this critical 
temperature, a tiny change in the temperature results in a “sudden” systematic 
change in the substance. The two materials are governed by a different set of 
parameters and qualitative properties. For example, we can talk about cracking 
ice but not water. 

Non-linear dynamical systems appearing in physics, biology or economy are 
not the only source of bifurcation phenomena. Even computer systems can 


This work has been partially supported by the Czech Science Foundation grant No. 
18-001785. 
© The Author(s) 2019 


T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 339-356, 2019. 
https://doi.org/10.1007/978-3-030-17465-1_19 


340 N. Beneš et al. 


suddenly alter the quality of their behaviour. A simple example might be a 
significant performance degradation of a computation caused by system swap- 
ping. Studying bifurcations in computer systems can provide an additional for- 
mal analysis ingredient leading to a better understanding of critical systems 
properties, like stability or robustness. 

Inspired by the bifurcation theory for dynamical systems, we have developed 
an approach that allows analysing how the dynamics (runs, state transitions) 
of a discrete computer system changes when its parameters are changed [6,11]. 
We call the method digital bifurcation analysis. In the approach, the qualitative 
changes in the behaviour are represented as changes in the truth-value of tem- 
poral formulae defining specific behaviour (portrait) pattern of the system. The 
method for computing results of the bifurcation analysis (typically presented as 
bifurcation diagrams) uses our novel symbolic parallel parameter synthesis algo- 
rithm [3] which itself builds on the model-checking technology. As the approach 
employs a hybrid temporal logic for which the algorithm is computationally 
demanding we have also developed specialised algorithms dedicated to some 
specific formulae/patterns and thus working more efficiently. 

Example of such patterns are attractors, which we see as a particular class 
of patterns representing the states of the system in which the system’s execu- 
tion persists in the long-time horizon, i.e., the so-called invariant subsets of 
the state-space towards which the system’s runs are attracted. In computer 
systems, the most typical attractors can be observed in the form of terminal 
strongly connected components (tSCCs) [38]. We have developed an efficient 
parallel algorithm for detecting tSCCs in parametrised graphs in [1], and we use 
this algorithm in our two case studies. We have already successfully applied the 
digital bifurcation analysis to several models from systems biology [4,5]. 

In this case-study paper, we report on the application of digital bifurcation 
analysis to the Transmission Control Protocol (TCP) which currently facilitates 
most of the internet communication. One of the severe problems in practical 
applications of TCP is congestion, appearing when the required resources over- 
run the capacity of internet communication. Over the past years, many internet 
congestion control mechanisms have been developed to ensure the reliable and 
efficient exchange of information across the internet, such as Active Queue Man- 
agement (AQM). Bifurcation analysis of TCP under various congestion control 
mechanisms have been studied by several authors [16,25,30,32,40,41]. All have 
used a continuous-time model (e.g., the fluid model) and applied traditional 
mathematical methods of bifurcation analysis, including simulations, to detect 
parameter values when the system passes through a critical point, the system 
loses its stability, and a so-called Hopf bifurcation occurs [22]. 

Our approach to bifurcation analysis does not require to remodel the given 
discrete system in terms of a continuous-time dynamical system. Digital bifur- 
cation analysis works directly on discrete models represented as state transi- 
tion systems. Furthermore, the method is, unlike mathematical methods, fully 
automatic and does not need mathematical skills to be utilised. Another advan- 
tage is that the method is scalable to state spaces with tens of variables and 
tens of possibly dependent parameters, overcoming thus significantly the limits 
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of traditional mathematical methods. Last but not least the method is advan- 
tageous in performing global bifurcation analysis, which is harder to compute 
than the local analysis where bifurcation points are expected to be approximately 
known in advance. 

It is important to stress that the purpose of this case-study paper is not to 
propose any new congestion control mechanisms or protocols. We aim to provide 
a demonstration of the appropriateness and usefulness of the digital bifurcation 
analysis as a push-button technique that makes a promising alternative to the 
classical approaches when analysing stability and robustness of TCP protocol 
specifications and implementations. To that end, we consider two different case 
studies targeting TCP. In both of them, we analyse how the structure and qual- 
ity of attractors change when the parameters change. The first one deals with 
TCP that uses the Random Early Detection (RED) method [14] as an active 
queue management mechanism to control congestion. Although the RED mecha- 
nism alone is easy to understand, its interaction with TCP connections is rather 
complicated and is not well understood. In [33] the authors used a deterministic 
non-linear dynamical model of the TCP-RED protocol (together with detailed 
simulations) to demonstrate that the model exhibits a transition between a sta- 
ble fixed point and an oscillatory or chaotic behaviour as parameters are varied. 
In our case study, we were able to achieve the same results fully automatically 
using our method. In the second example, we consider TCP itself combined with 
essential performance-oriented extensions. We analyse how the sizes of the send 
and receive socket buffers influence the throughput; in particular, we identify 
the combinations of sizes (bifurcation points) for which we observe a dramatic 
drop. The results we have achieved are in accordance with [28]. 

It is worth noting that bifurcation analysis provides a conceptually very dif- 
ferent view of the protocol functionality than what is usually addressed by formal 
verification methods. The goal of verification is to prove the correctness of a sys- 
tem specification for all initial states and in the case of parametrised verification 
also regardless of the number of its components, or the parametrised domain of 
variables. On the other hand, the goal of bifurcation analysis of parametrised 
systems is to identify parameter values for which the system suddenly changes 
its behaviour regardless of its correctness. 

Several examples of the TCP protocol verification are in [8,13,18,23,35-37]. 
As regards parametric verification, the Bounded Retransmission Protocol (BRP) 
for manually derived constraints has been checked by parametric model-checking 
in [19], the Stop-and-Wait Protocol (SWP) has been targeted in [15] for all 
possible values of the maximum sequence number and the maximum number of 
retransmissions parameters. We are not aware of any formal verification method 
that would address the bifurcations of the protocol behaviour. 

Finally, we discuss the approaches related to bifurcation analysis. To the 
best of our knowledge, the only related approach to bifurcation analysis that 
also employs methods of formal verification has been presented in [20,21]. The 
authors address the identification of bifurcation points in non-trivial dynamics 
of a numerical cardiac-cell model represented using a hybrid automaton. The 
method is based on guided-search-based bounded-time reachability analysis used 


342 N. Beneš et al. 


to estimate ranges of parameter values displaying two complementary patterns 
of systems behaviour. These ranges are computed for bounded-time reachability 
and over-approximated up to a particular d-precision due to the underlying ô- 
decision algorithm. 


2 Attractor Analysis Workflow 


We first describe the standard scenario for digital bifurcation analysis focused on 
attractor analysis. The input is a parametrised system and a certain classifica- 
tion of stability-based attractor properties that we are interested in. The system 
is in the model design phase formalised as a discrete finite-state model and sub- 
sequently via the state-space generation procedure turned into a parametrised 
graph. How the initial model is obtained and what language the model is writ- 
ten in is domain-specific and is explained later when describing the case studies. 
The classification of the attractor properties specifies what shapes and forms of 
attractors we want to consider distinct enough to express a dramatic change in 
the system’s behaviour. In the simplest case, which we call the counting ver- 
sion of our problem, we may be merely interested in the number of attractors 
and consider two parametrisations of a system non-equivalent if this number 
changes. More interesting cases may classify the attractors according to various 
stability-related properties, such as oscillations. The core parametric analysis 
algorithm then computes the parametric tSCC map. The resulting map is post- 
processed, producing e.g. the visualisation of bifurcation diagram, plots, tables, 
etc. The workflow of our method for the digital bifurcation analysis of attractors 
is summarised in Fig. 1. 


system 
model design stability-related 
Y attractor classification 
discrete, 


. | formalisation 
finite-state model 


tSCC classification 


state-space 
generation 
v 


parametrised graph 


parametric 
analysis 4 


parametric tSCC map 


bana 


results visualisation 
(bifurcation diagram, tables, plots, ...) 


Fig. 1. Attractor analysis method workflow. 


In general, our digital bifurcation analysis algorithm presupposes that the 
state space of the model has the form of a parametrised Kripke structure. In 
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this case study, we are interested in attractor properties that are independent 
of the atomic proposition valuation. We, therefore, consider a simpler formalism 
here, namely that of parametrised graphs which are directed graphs with self- 
loops allowed and edges labelled by parameters taken from a given parameter 
set. 


Definition 1. A graph is a pair (V, E) where V is a finite set of vertices and 
EC VxV is a set of edges. A parametrised graph is a tripleG = (V, E, P) where 
P is a set of parametrisations and E : V x V — 2" such that for each p € P, 
Gp = (V, Ep = { (u,v) | p € E(u, v)}) is a graph. We call Gp the projection of G 
on p. 


To be able to investigate the properties of the attractors in the system, we 
need to use a notion that is analogous to an attractor in a parametrised graph. In 
dynamical systems theory, an attractor [27] is the smallest set of states (points 
in the phase space) invariant under the system dynamics. Parametrised graphs 
can be regarded as discrete abstractions of a dynamical system in which the 
dynamics are represented using paths in the graph. The respective abstraction 
of the notion of an attractor thus coincides with the notion of a terminal strongly 
connected component (tSCC) of a graph. 


Definition 2. Let G = (V, E) be a graph. We say that a vertex t € V is reach- 
able from a vertex s E€ V if (s,t) € E* where E* denotes the reflexive and 
transitive closure of E. A set of vertices C C V is strongly connected, if v is 
reachable from u for any two vertices u, v € C. A strongly connected component 
(SCC) is a maximal strongly connected set C C V, i.e. such that no C’ with 
CCC’ CY is strongly connected. A strongly connected component C is called 
terminal (tSCC) if (C x (V\C))NE=9, i.e. there are no edges leaving C. 


We are now ready to state the algorithmic problem whose solution forms the 
basis of our method. 


Terminal SCCs Enumeration Problem. Let G = (V,E,P) be a para- 
metrised graph. The goal is to enumerate, for every parametrisation p € P, 
all tSCCs in the graph Gp, the projection of G on p. 

In this general version of our problem, the output is going to be a mapping 
that assigns to each p € P the set of all tSCCs of Gp. We call this the parametric 
tSCC map. This map may be then further processed and visualised. We are 
mainly interested in the bifurcation diagram of the model. This diagram is a 
plot which partitions the parameter space into regions where the behaviour of 
the system is qualitatively invariant. In the case of a single parameter, this type 
of one-dimensional diagram is typically augmented by a second dimension which 
presents the location of the tSCCs with respect to a chosen system variable. 

To be able to distinguish between quantitatively different behaviour of the 
system, we need to formalise the classification of stability-based attractor prop- 
erties in terms of tSCCs. We thus get a classification function that separates 
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tSCCs into classes. Two parametrisations of a system are then said to be quali- 
tatively different if their respective graphs differ in the count of tSCCs belonging 
to each class. In the case of the counting version, we thus consider one class of 
tSCCs only. Here, parametrisations of a system are considered to be qualita- 
tively different if their graphs contain a different number of tSCCs. In the more 
detailed cases, we can classify tSCCs according to size (small vs large), density 
(sparse vs dense), graph-specific properties (bipartite vs non-bipartite) etc. 

For an example of how these classifications relate to the classical bifurca- 
tion analysis, we may see bipartite tSCCs as representing oscillatory patterns in 
attractors. The change from a small non-bipartite tSCC to a bipartite tSCC can 
be thus seen as an analogy of the Hopf bifurcation. In our two case studies, we 
distinguish between sinks (single-state tSCCs), bipartite (oscillatory) tSCC, and 
other tSCCs, which are further differentiated between small and large, based on 
a chosen domain-specific threshold. 

The rest of this section gives a brief overview of the parallel algorithm for 
solving the tSCCs enumeration problem that we have developed in [1]. 


2.1 Core Algorithm 


First, note that a simple sequential solution to the problem is to use any reason- 
able SCC decomposition algorithm (e.g. Tarjan’s [39]) and enumerate the tSCCs 
in the residual graph. However, all known optimal sequential SCC decomposi- 
tion algorithms use the depth-first search algorithm, which is suspected to be 
non-parallelisable [34]. There are known parallel SCC decomposition algorithms; 
for a survey, we refer to [2]. Our approach is based on the observation that we 
do not have to compute all of the SCCs to enumerate the terminal ones. 

Furthermore, instead of scanning through all parametrisations and solving 
the problem for every one of them separately our approach deals with sets of 
parametrisations directly. This makes our algorithm suitable for use in connec- 
tion with various kinds of symbolic set representations. The reason for using a 
parallel algorithm is the necessity to deal with the high computational demands 
of the method as discussed in [1]. 

The main idea of the Terminal Component Detection (TCD) algorithm lies in 
repeated reachability, which is known to be easily parallelisable. To explain the 
method, we start with a non-parametrised version of the algorithm. The following 
explication is illustrated in Fig.2. Let us assume a given (non-parametrised) 
graph G = (V, E). We choose an arbitrary vertex v € V (denoted by the double 
circle in the illustration) and compute all vertices reachable from v; let us call the 
resulting set of vertices F. We further compute the set of all vertices backwards- 
reachable from v inside F; we call the resulting set B. Finally, we compute all 
vertices backwards-reachable from any vertex of F; let us call this set B’. 

Clearly, B is an SCC of the graph, and moreover, it is a terminal SCC iff 
F\ B is empty. Furthermore, B’\ F contains no tSCCs: all vertices in B’\ F have 
a path to a vertex in F. We recursively run the algorithm in F \ B and V \ B’ if 
non-empty. Observe that no tSCC may intersect both of these sets and these two 
subproblems can be thus dealt with independently (i.e. in parallel). Note that 
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Fig. 2. Illustration of the non-parametrised version of our algorithm. 


every time the algorithm is (recursively) started, its input is an induced subgraph 
of the original graph that satisfies the precondition that all its tSCCs are tSCCs 
of the original graph. These observations together imply the correctness of the 
algorithm. 

The asymptotic complexity of the algorithm in its non-parametric version is 
of the order O(|V|-(|V|+|£])) as in the worst case, every iteration may eliminate 
a Single vertex of the graph. The actual performance of the algorithm strongly 
depends on the choice of the initial vertex v. If we consistently choose v that 
lies close to (or directly in) a tSCC of the graph, the complexity gets linear. 
Of course, such choice cannot be made in advance. The paper [1] discusses the 
impact of several heuristics that try to approximate this choice. 

The algorithm can also be made more efficient using a trimming subprocedure 
in the manner of [26], i.e. removing all vertices without incoming edges. In Fig. 2, 
the removed vertices are marked in grey; furthermore, the V\ B’ part of the graph 
contains one vertex that would be removed in the next recursive run. 

To extend the basic idea to parametrised graphs, we use a notion of 
parametrised sets of vertices. Formally, a parametrised set of vertices Aisa 
function A: V — 2”. To deal with parametrised sets, we use a generalisation 
of the standard set operations. All the operations are performed element-wise, 
e.g. the union of parametrised sets AU B is defined as the parametrised set C 
such that C(v) = A(v) U B(v) for all v. The parametrised set of all vertices and 
all parametrisations is given by V such that V(v) =P forall ve V. 

The notions of the forward and backward reachable sets can be easily 
extended to the parametrised setting. They can be computed by a fixed-point 
algorithm which iterates the parametrised successor (or predecessor) operator. 


Given a parametrised set of vertices X, the successor operator computes the 
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parametrised set Y such that Y(v) = X(v) U Unev(X (u) N E(u,v)) and simi- 
larly for the predecessor operator. 

The parametrised algorithm then proceeds as described in the previous, 
extended with the parametrised sets. One further key difference is that instead 
of choosing one starting vertex, we need to choose a set of starting vertices with 
disjoint parametrisation sets that together cover all parametrisations that are 
present in the currently explored parametrised subgraph. The reason for this, as 
well as a discussion on heuristics that allow choosing such sets efficiently, can be 
again found in [1]. 

In the worst case, when parametrisations are represented explicitly, the 
asymptotic complexity of the algorithm is of the order O(|P| - |V|- (|V| + |£])). 
The actual performance of the algorithm depends on various choices and heuris- 
tics. It can also be strongly influenced by the usage of a symbolic encoding of 
the parametrised sets. In this paper, the sets of parametrisations are represented 
symbolically using an interval encoding, similar to the one used in [10]. Other 
options for a symbolic representation of parameters include SMT formulae [3]. 


3 Case Studies 


In this section, we present two case studies focusing on discovering bifurcations in 
the behaviour of the TCP protocol. Each of them addresses a different essential 
aspect of the protocol, namely congestion control and packet flow stability. We 
demonstrate how the digital bifurcation analysis can aid in the design, analysis 
and control of these discrete reactive systems. 

In the first case study, we consider a relatively common setting in the stan- 
dard bifurcation theory: A discrete map governing the behaviour of the RED 
congestion control mechanism. This mechanism prevents congestion on network 
nodes such as routers and is subject to changes in its behaviour due to different 
internal and external parameters. We show how different parameters influence 
the stability of the mechanism and how a hypothetical system administrator or 
an automated controller can use this information to avoid faulty behaviour. 

The second case study presents an entirely discrete model of the basic TCP 
focusing on the stability of packet flow. We study the influence of the sender 
and receiver buffer sizes on the behaviour of the protocol and its ability to 
transfer packets in a timely manner. We assume the role of a hypothetical pro- 
tocol designer and consider a set of extensions and modifications to the protocol 
proposed by various networking experts. We observe that such extensions and 
their interplay can introduce bifurcations leading to serious degradation of the 
protocol performance. 

The case studies are implemented with the help of the tool Pithya [7] which 
provides the necessary parametrised graph analysis algorithms. The source code 
of this implementation is available at https://github.com/sybila/tcp-bifurcation. 
All experiments were performed on a typical 4-core 3 GHz desktop computer with 
16 GB of RAM. 
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3.1 Instabilities in TCP-RED 


This case study addresses the congestion control in TCP. The congestion control 
mechanism prevents the protocol from overloading the network with too many 
packets. The problem has two important aspects. The first aspect is the con- 
gestion control on the sender side that has to ensure maximal throughput for 
a single flow of packets. The second aspect is the congestion control on other 
network nodes, such as routers, where several connections meet. 

One of the common approaches to implementing the congestion control on 
routers is the Random Early Drop (RED) method proposed in [14]. This tech- 
nique explicitly drops packets as the router queue starts to fill up. Consequently, 
senders are indirectly notified (by observing the packet loss) that the link is 
approaching a congested state before the situation becomes critical. 


Model Description. To study the RED mechanism, we use a discrete time 
model proposed in [33]. In Fig.3, we present the model equations and a basic 
description of all model variables and constants. Detailed aspects of the model 
design are given in the original paper. 


0 q € [0, q] B pe € [0, pi] 
pt(G) = = Pmax Gt © (qu, qu) (1) a(pe) = 75 cd Pt (Pi, Pu) (2) 
1 q: € [du B] 0 Pt € [Pu, 1] 


Tiy (€) = (1 cal w) “Gt w: qu (pe(,)) (3) 


i , drop rate p+ € [0, 1] 
maximum buffer size B = 3750 . 
queue size qs € [0, B] 
lower queue threshold qı = 250 
average queue size J, € [0, B] 
upper queue threshold qu = 750 


2 
packet size m = 4kb lower drop threshold pı = tomik 
de + Bm 
maximum drop rate pmaz = 0.1 a2 
number of TCP connections n = 250 upper drop threshold py = (z ; m ) 


propagation delay d = 0.1s 


i ; rate constant k = \/3/2 
link capacity c = 75Mb/s 


averaging weight w = 0.15 


Fig. 3. A discrete time model of the RED congestion control behaviour. The individual 
constants are stated with basic explanations and default values. The parameters are 
selected from the given set of constants and their bounds are specified later with the 
corresponding experiments. 


The model assumes n connections flowing through a single RED-capable 
router. All connections share basic properties, namely the packet size and the 
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propagation delay. In such a case, the situation can be simplified by considering 
only a single combined flow, as the router cannot differentiate between the indi- 
vidual flows anyway. The router then maintains the current drop rate p; (Eq. 1) 
and the queue size q (Eq. 2) based on the current exponentially weighted average 
queue size q, (Eq. 3). 

A typical scenario is that a network administrator takes control over param- 
eters such as the averaging weight w or the queue thresholds q and q,. Fur- 
thermore, it is also important to consider the influence of the connection count 
n and the propagation delay d, as these numbers will change depending on the 
current network load. 


Parametrised Graph. To analyse the model, we require a finite parametrised 
graph G = (V,E,P). Here, P is the parameter space given by the chosen model 
parameters (we specify the chosen parameters for each experiment later). In 
Eq.3, we write 9,,,(@, À) for A € P to specify the parametrised version of the 
model. 

We assume s + 1 thresholds tọ < tı < ... < ts such that tọ = 0 and t, = 
B. These thresholds partition the state space of the variable q into s intervals 
[to, ti],---, [ts—1,ts], denoted as J1, . . . , Is. These intervals then represent vertices 
of our parametrised graph V = {J; | i € [1, s]}. 

Next, we construct the parametrised edges between our intervals so that they 
over-approximate the behaviour of the original discrete map. Let us consider two 
intervals J; and J; and the edge from J; to Ij. Clearly, the set of parametrisations 
E(J;,1;) has to include all parametrisations À such that for some q; € J; it holds 
that G41(@,A) € Ij. We compute these sets using interval arithmetic, ensuring 
that all such parametrisations are included. 

Finally, since our graph over-approximates the original discrete map, each 
tSCC over-approximates some attractor(s) of the original system. Furthermore, 
the precision of this over-approximation can be refined by introducing additional 
thresholds or substituting interval arithmetic for a more sophisticated approxi- 
mation method, e.g., Taylor models [24]. 


Analysis Results. The analysis procedure consists of two scenarios: 

Scenario 1: Consider a system designer who studies the effects of parame- 
ters to assess correct settings ensuring the stable behaviour of the protocol. In 
Fig. 4(a) and (b), the locations and types of attractors are shown for param- 
eters w and n, respectively. It can be seen that increasing the parameter w 
has a destabilising effect — the small (stable) tsCC (component size <0.01 - B) 
turns into a bipartite tSCC (representing oscillation) and finally into a large 
non-bipartite tSCC. On the other hand, the effect of the connection count n is 
complementary: a higher number of connections stabilise the behaviour (Fig. 4b). 
Additionally, the protocol behaves as expected in the stable region — w does not 
influence the location of the steady state whereas a higher number of connections 
require higher queue sizes to accommodate the increased data flow. Using this 
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Fig. 4. Bifurcation diagrams showing the location and character of the tSCC depending 
on model parameters in the RED model. The green region indicates a small component 
(<0.01- B), the blue region shows oscillatory behaviour (bipartite graph), and the red 
region corresponds to a large non-bipartite tSCC. (a) w € [0.1,0.2] and n = 250; 
(b) n € [200,300] and w = 0.15; (c) w € [0.1,0.2] and n € [200,300]. (Color figure 
online) 


kind of analysis, a general overview of the systems behaviour w.r.t. the given 
parameters can be directly obtained in a matter of minutes. 

Scenario 2: Assume an administrator (or an automated controller) is sup- 
posed to adjust the parameter w to preserve the correct functionality of the 
system subject to a varying number of connections n. In Fig. 4c, it is shown how 
the character of the attractor changes with the controllable parameter w and the 
external condition n. This allows the administrator to select optimal values for 
the given situation. Note that while this specific type of diagram does not show 
the concrete location of components, it is still contained in the method results 
and can be used to support the decision further. While this type of analysis is 
certainly more computationally challenging, it can still be performed in under 
one hour. 


3.2 Packet Flow Stability 


The TCP specification as defined in RFC 793 [31] provides a fundamental 
description of the TCP protocol such as the packet format or the state machine 
for event processing. However, many implementation and performance aspects 
were not addressed in the original specification. Therefore in the subsequent 
years, several extensions and improvements of the protocol functionality have 
been introduced [9,12, 29]. 

Nowadays, many well-tested, production ready implementations of TCP 
exist. However, as demonstrated in [28], non-standard network configurations 
and combinations of various modifications can cause problems even in well- 
established implementations. Furthermore, new implementations are still being 
developed where such fundamental problems can easily re-appear [17]. 

In this case study, we assume the role of a hypothetical protocol engineer. 
We introduce a basic parametrised model of TCP according to RFC 793 [31] 
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extended with two performance-oriented modifications, namely delayed acknowl- 
edgement and Nagle’s algorithm. We observe that these modifications, while use- 
ful in many instances, can introduce unexpected bifurcations in the behaviour 
of the protocol. Additionally, we compare our results with [28]. 


Model Description. We consider a model of TCP based on RFC 793 [31] 
extended with Nagle’s algorithm according to RFC 896 [29] and delayed acknowl- 
edgement according to RFC 813 [12] and RFC 1122 [9]. We assume a single 
sender which sends an uni-directional infinite stream of data to a single receiver 
connected by a reliable link with unlimited capacity. As parameters, we assume 
a fixed maximal buffer size S for the sender and R for the receiver. Finally, the 
size of each packet is limited by the Maximum Segment Size (MSS) set by the 
network administrator. 

Since we are not interested in the exact values of the transmitted data bytes, 
we can model the state of the protocol using the number of bytes in each protocol 
phase. This abstraction leads to the following five state variables: 


— W- the number of bytes in the send buffer waiting to be sent; 

— D-— the list of data packet sizes in transit; 

— U- the number of bytes in the receive buffer waiting to be acknowledged; 
— A- the list of acknowledgement packets in transit; 

— ACK — the out-of-order acknowledgement flag. 


Furthermore, we use outstanding to denote the number of unacknowledged 
bytes (U plus the sum of all elements in D and A). Since the protocol is not 
limited by the link capacity, we assume the available window is always equal to 
min(S,R) minus outstanding bytes. Notice that all the bytes considered by 
the model variables must be stored in the send buffer (the sender must keep 
the data until acknowledgement arrives), whereas only the bytes waiting to be 
acknowledged are stored in the receive buffer. 

The dynamics of the model is governed by a set of discrete asynchronous 
events. Each event can be only executed when its preconditions are met. As our 
parametrised graph, we consider the graph of the protocol states reachable from 
the initial configuration where all channels are empty, and all variables are zero. 
The model consists of the following discrete events: 


Copy data from the application: Before sending, the data needs to be copied from 
the application to the kernel memory where the networking layer operates. This 
occurs in 1024-byte chunks such that at least for every four chunks, the copying 
is interrupted to send available data right away [28] if possible: 
W= W+ k- 1024; where k € [1..4] is maximal 
such that (k - 1024 + W + outstanding < S) 


Send full packet: When MSS unsent bytes are available in the send buffer and 
the window capacity is sufficient, a full packet can be constructed and sent: 


W = W — MSS; D = append(D, MSS); when (window > MSS A W > MSS) 
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Send partial packet: When less than MSS unsent bytes are available, or the 
window is not large enough, the protocol can decide to send a partial packet. 
This decision is governed by Nagle’s algorithm which dictates that a partial 
packet can be sent only when there are no outstanding bytes. This criterion 
prevents the sender from sending unnecessary small packets in an unbuffered 
stream of data: 


W = W — packet; D = append(D, packet); where 
(packet = min(window, MSS, W) A outstanding = 0) 


Receive and acknowledge packet: The receiver can process and acknowledge any 
data packet (we assume the data is immediately handed over to the application). 
However, to avoid a large number of small acknowledgement packets, the packet 
acknowledgement is often delayed until a sufficient amount of data is received 
(RFC 813). In our case, we use the threshold specified in [28] — 35% of R. In 
RFC 1122, this rule is further augmented to send an acknowledgement packet 
whenever two full segments are received: 


A = append(A,U + head(D)); D = tail(D); U = 0; when 
(D| > OA U+ head(D) > min(0.35 - R, 2 -MSS)) 


Receive without acknowledgement: When the rules of delayed acknowledgement 
are not met, the data bytes are transferred to the receive buffer instead: 


U = U + head(D); D = tail(D); when 
(D| > 0 AU + head(D) < min(0.35 - R, 2- MSS)) 


Out-of-order acknowledgement: According to RFC 813, when data is received 
without immediate acknowledgement, a 200ms timer should be started to 
acknowledge the data if no acknowledgement packet is generated in the mean- 
time. However, as discussed in [28], regularly rescheduling such a timer can be 
an expensive operation. Therefore a cyclic timer acknowledging all received data 
every 200 ms is often used instead. In our model, we include this design deci- 
sion by allowing one non-deterministic out-of-order acknowledgement packet to 
occur: 


A = append(A,U); U = 0; ACK = 1 when (U > 0 ^ ACK = 0) 
Process acknowledgement: The data cannot be removed from the send buffer 


until they are acknowledged. Thus whenever there is an acknowledgement packet 
in transit, the packet can be processed by the receiver: 


A = tail(A); when |A| > 0 
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Fig. 5. The bifurcation diagrams showing the character of tSCCs depending on the 
model parameters in the TCP model. The white space indicates a single large tSCC; 
the other colours indicate the regions displaying various types of single state tSCCs. 
(a) MSS = 9204, 1 KiB increments of S and R; (b) MSS = 9204, 8 KiB increments of 
S and R; (c) MSS = 1460, 1 KiB increments of S and R. (Color figure online) 


Analysis Results. In our analysis, we assume the buffer sizes S and R ranging 
from 1 KiB to 64KiB in 1 KiB increments. First, we consider MSS to be 9204, 
as in [28]. This MSS configuration corresponds to a specific high-performance 
network and is not used in typical Ethernet configurations. 

The complete results of our analysis are presented in Fig. 5a. In contrary to 
the previous case study, we consider the presence of a single large terminal tSCC 
as the desired behaviour (depicted in white). In this case, the situation indicates 
that the protocol is functioning properly. On the other hand, the presence of 
a small, single state tSCC means that the protocol cannot continue transmitting 
and is waiting for a time-out to resolve the problematic situation. 

Additionally, based on enabling and disabling various extensions of the pro- 
tocol model, we can distinguish between different bifurcation causes: 


— Delayed acknowledgement (DA): With the delayed acknowledgement 
employed exclusively, the parametrisations satisfying S <0.35-RAS < 
2-MSS can never trigger the automatic acknowledgement and thus rely on the 
acknowledgement time-out instead. The corresponding regions are depicted 
in green in Fig. 5. 

— Combination of DA and Nagle’s algorithm: A single state tSCC emerges 
whenever the amount of data necessary to trigger the next automatic acknowl- 
edgement cannot be sent due to Nagle’s condition. The corresponding regions 
are depicted in blue in Fig. 5. 

— Combination of DA, Nagle’s algorithm, and cyclic timer: The regions depicted 
in red in Fig. 5 correspond to single state tSCCs appearing only when all the 
three extensions are enabled. The reason is that while delayed acknowledge- 
ment and Nagle’s algorithm can coexist well under these parametrisations, 
the cyclic timer can cause transmissions of small packets which is not possible 
in the cases above. 
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In the case of S < R, the achieved results are in line with the findings of [28]. 
However, in the R > S area, we observe a bifurcation caused by the interplay of 
delayed acknowledgement and Nagle’s algorithm which has not been considered 
in the original paper. This bifurcation is caused by small packets sent right 
after an acknowledgement is received. The small packet is transmitted after the 
acknowledgement clears the outstanding bytes (so Nagle’s condition holds), but 
before more data is copied into the send buffer (before the acknowledgement was 
received, the send buffer was full). 

In [28], the situation might have been avoided by some undisclosed imple- 
mentation or timing aspects. However, another possible explanation is that this 
behaviour has been overlooked because such issues never occurred during the 
experiments. In Fig. 5b, we present our reconstruction of the same results, but 
in 8 KiB increments. It corresponds exactly to the experimental evaluation pre- 
sented in [28]. The described behaviour is absent in this case, since the 8 KiB 
increments avoid the problematic region entirely. 

Finally, in Fig. 5c, we present the same analysis for the maximal buffer size 
of 32 KiB and MSS of 1460 bytes, which is the typical setting on an Ethernet 
network. In this case, the red region is completely absent, and while other bifur- 
cations are still present, the problematic regions are much smaller due to the 
smaller MSS. This puts into perspective the drastic behavioural changes present 
for larger MSS values and shows how bifurcations can emerge in unexpected 
situations. 


4 Discussion and Conclusion 


In this paper, we have presented two case studies demonstrating a promising 
application of the digital bifurcation analysis in the domain of network protocols. 
To that end, we have utilised the methodology developed in our previous work. 

The key aspects of the method as applied in this paper are the following. 
First, it gives rigorous results concerning the given models of the studied pro- 
tocol. Second, it can be performed fully automatically. In general, the only 
tasks that have to be done manually are to acquire a suitable model and to 
post-process the results (incl. visualisation and interpretation). The crucial step 
to be done within the latter task is to classify the studied protocol proper- 
ties in terms of attractors. However, this can be easily automated since the 
interest of a network administrator (or a designer) is primarily focused on 
parameter values for which the stable behaviour (a single simple attractor) 
disappears. 

Both case studies show that the digital bifurcation analysis provides a 
methodologically different view on the protocol analysis than formal verification 
or testing. This is allowed by providing a global view of the protocol behaviour 
with respect to parameters. Due to the global approach, in the second case study, 
we have revealed regions in bifurcation diagrams that were omitted in previous 
studies. 
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The push-button characteristics of the digital bifurcation analysis allow mak- 
ing the results easily reproducible. All steps necessary to reconstruct both case 
studies are publicly available. 

For future work, our primary intention is to target similar, but not yet fully 
explored, problems in network protocols using digital bifurcation analysis that 
will allow further fine-tuning (and generalisation) of the presented workflow. 
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Abstract. Many fault-tolerant distributed algorithms are designed for 
synchronous or round-based semantics. In this paper, we introduce the 
synchronous variant of threshold automata, and study their applicability 
and limitations for the verification of synchronous distributed algorithms. 
We show that in general, the reachability problem is undecidable for 
synchronous threshold automata. Still, we show that many synchronous 
fault-tolerant distributed algorithms have a bounded diameter, although 
the algorithms are parameterized by the number of processes. Hence, we 
use bounded model checking for verifying these algorithms. 

The existence of bounded diameters is the main conceptual insight in 
this paper. We compute the diameter of several algorithms and check 
their safety properties, using SMT queries that contain quantifiers for 
dealing with the parameters symbolically. Surprisingly, performance of 
the SMT solvers on these queries is very good, reflecting the recent 
progress in dealing with quantified queries. We found that the diame- 
ter bounds of synchronous algorithms in the literature are tiny (from 1 
to 4), which makes our approach applicable in practice. For a specific 
class of algorithms we also establish a theoretical result on the existence 
of a diameter, providing a first explanation for our experimental results. 
The encodings of our benchmarks and instructions on how to run the 
experiments are available at: [33]. 


1 Introduction 


Fault-tolerant distributed algorithms are hard to design and verify. Recently, 
threshold automata were introduced to model, verify and synthesize asyn- 
chronous fault-tolerant distributed algorithms [19,21,24]. Owing to the well- 
known impossibility result [18] many distributed computing problems, including 
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ı int v:=input({0, 1}) 

2 bool accept:=false 

3 while (true) do { // in one synchronous step 
4 if (v = 1) then broadcast <ECHO>; 

5 receive messages from other processes; 

6 if received <ECHO> from > t + 1 processes 
7 then v:=1; 

8 if received <ECHO> from > n — t processes 
9 then accept:=true; 

10 } 


Fig. 1. Pseudo code of synchronous reliable broadcast à la [32], and its STA, with 
guards: gd = #{v1,sE,Ac} > t+1-— f and dg = #{v1,sE, AC} > n—t-— f and 
o3 = #{v1,SE,Ac} <t+land ġ4 = #{vl1,SE, Ac} <n-t. 


consensus, are not solvable in purely asynchronous systems. Thus, synchronous 
distributed algorithms have been extensively studied [5,26]. In this paper, we 
introduce synchronous threshold automata, and investigate their applicabil- 
ity and limitations for verification of synchronous fault-tolerant distributed 
algorithms. 

An example of such a synchronous threshold automaton is given in Fig. 1 
on the right; it encodes the synchronous reliable broadcast algorithm from [32]. 
(The pseudo code is in Fig. 1 on the left.) Its semantics is defined in terms of a 
counter system. For each location 4; € {v0, v1, SE, AC} (a node in the graph), we 
have a counter «K; that stores the number of processes that are in 4. The system 
is parameterized in two ways: (i) in the number of processes n, the number of 
faults f, and the upper bound on the number of faults t, (ii) the expressions in the 
guards contain n, t, and f. Every transition moves all processes simultaneously; 
potentially using a different rule for each process (depicted by an edge in the 
figure), provided that the rule guards evaluate to true. The guards compare a 
sum of counters to a linear combination of parameters. For example, the guard 
dı = #{v1,SE,ac} > t+1-—f evaluates to true if the number of processes 
that are either in location v1, SE, or AC is greater than or equal to t+ 1-— f. 

Synchronous Threshold Automata (STA) model synchronous fault-tolerant 
distributed algorithms as follows. As processes send messages based on their 
current locations, we use the number of processes in given locations to test how 
many messages of a certain type have been sent. However, the pseudo code in 
Fig. 1 is predicated by received messages rather than by sent messages. This 
algorithm is designed to tolerate Byzantine-faulty processes, which may send 
spurious messages to some correct processes. Thus, the number of received mes- 
sages may deviate from the number of correct processes that sent a message. 
For example, if the guard in line 7 evaluates to true, the t+ 1 received messages 
may contain up to f messages from faulty processes. If 7 correct processes send 
<ECHO>, for 1 < i < t, the faulty processes may “help” some correct processes to 
pass over the t+ 1 threshold. In the STA, this is modeled by both the rules rı 
and rz being enabled. Thus, the assignment v:=1 in line 7 is modeled by the rule 
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Table 1. A long execution of reliable broadcast and the short representative 


Process|ao| o1 |o2|... |or41|or+2|or43 Process] a loil 04 
1 vl SE|...] SE | SE | AC 1 vl AC 

2 vO| vO ...| SE | SE | AC 2 vO AC 
t+1 |vo[vo[vo]... SH] se | ac t+1 |vO]SB)Ac 
n— f |vO[v0]vO]...] vo Pse ac n—f |vO)SB)Ac 


rg, guarded by ¢2. The implicit “else” branch between lines 7 and 8 is modeled 
by the rule rı, guarded by #3. As the effect of the f faulty processes on the 
correct processes is captured by the guards, we model only the correct processes 
explicitly, so that a system consists of n — f copies of the STA. 


Contributions. We start by introducing synchronous threshold automata (STA) 
and the counter systems they define. 


1. We show that parameterized reachability checking of STA is undecidable. 

2. We introduce an SMT-based procedure for finding the diameter of the counter 
system associated with an STA, i.e., the number of steps in which every 
configuration of the counter system is reachable. By knowing the diameter, we 
use bounded model checking as a complete verification method [11, 14,22]. 

3. For a class of STA that captures several algorithms such as the broadcast 
algorithm in Fig. 1, we prove that a diameter is always bounded. The diameter 
is a function of the number of guard expressions and the longest path in the 
automaton, that is, it is independent of the parameters. 

4. We implemented our technique, by running Z3 [29] and CVC4 [7] as back-end 
SMT solvers, and evaluated it by applying it to several distributed algorithms 
from the literature: Benchmarks that tolerate Byzantine faults from [8-10, 
32], benchmarks that tolerate crashes from [13,26,30], and benchmarks that 
tolerate send omissions from [10,30]. 

5. We are the first to automatically verify the Byzantine and send omission 
benchmarks. For the crash benchmarks, our method performs significantly 
better than the abstraction-based method in [3]. By tweaking the constraints 
on the parameters n, t, f, we introduce configurations with more faults than 
expected, for which our technique automatically finds a counterexample. 


2 Overview of Our Approach 


Bounded Diameter. Consider Fig. 1: the processes execute the send, receive, and 
local computation steps in lock-step. One iteration of the loop is expressed as 
an STA edge that connects the locations before and after an iteration (i.e., the 
STA models the loop body of the pseudo code). The location SE encodes that 
v = 1 and accept is false. That is, SE is the location in which processes send 


360 I. Stoilkovska et al. 


<ECHO> in every round. If a process sets accept to true, it goes to location AC. 
The location where v is 1 is encoded by v1, and the where v is 0 by vO. 

An example execution is depicted in Table 1 on the left. We run n — f copies 
of the STA in Fig. 1. Observe that the guards of the rules rı and r2 are both 
enabled in the configuration go. One STA uses r2 to go to SE while the others 
use the self-loop rı to stay in vO. As both rules remain enabled, in every round 
one more automaton can go to SE. Hence, configuration 0,4, has t + 1 correct 
STA in location SE and rule rı becomes disabled. Then, all remaining STA go 
to SE and then finally to Ac. This execution depends on the parameter t, which 
implies that the length of this execution is unbounded for increasing values of 
the parameter t. (We note that we can obtain longer executions, if some STA use 
rule r4). On the right, we see an execution where all STA take rp immediately. 
That is, while configuration o,;3 is reached by a long execution on the left, it is 
reached in just two steps on the right (observe of = 0443). We are interested in 
whether there is a natural number k (which does not depend on the parameters 
n, t and f) such that we can always shorten executions to executions of length 
< k. (By length, we mean the number of transitions in an execution.) In such 
a case we say that the STA has bounded diameter. In Sect.5.1 we introduce 
an SMT-based procedure that enumerates candidates for the diameter bound 
and checks if the candidate is indeed the diameter; if it finds such a bound, it 
terminates. For the STA in Fig. 1, this procedure computes the diameter 2. 


Threshold Automata with Traps. In Sect.5.2, we define a fragment of STA for 
which we theoretically guarantee a bounded diameter. For example, the STA in 
Fig. 1 falls in this fragment, and we obtain a guaranteed diameter of <8. The 
fragment is defined by two conditions: (i) The STA has a structure that implies 
monotonicity of the guards: the set of locations that are used in the guards (e.g., 
{v1,SE, AC}) is closed under the rules, i.e., from each location within the set, 
the STA can reach only a location in the set. We call guards that have this 
property trapped. (ii) The STA has no cycles, except possibly self-loops. 


Bounded Model Checking, Completeness and (Un-)Decidability. The existence of 
a bounded diameter motivates the use of bounded model checking for verifying 
safety properties. In Sect. 6 we give an SMT encoding for checking the violation 
of a safety property by executions with length up to the diameter. Crucially, 
this approach is complete because if an execution reaches a bad configuration, 
this bad configuration is already reached by an execution of bounded length. We 
observe that for the STA defined in this paper (with linear guards and linear 
constraints on the parameters), the SMT encoding results in a Presburger arith- 
metic formula (with one quantifier alternation). Hence, checking safety proper- 
ties (that can be expressed in Presburger arithmetic) is decidable for STA with 
bounded diameter. We also experimentally demonstrate in Sect.7 that current 
SMT solvers can handle these quantified formulae well. On the contrary, we show 
in Sect. 4 that the parameterized reachability problem is undecidable for general 
STA. This implies that there are STA with unbounded diameter. 
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Fig. 2. Pseudo code of FloodMin from [13], and STA encoding its loop body, for k = 1, 
with guards: ¢1 = #{v0, c0} > 0 and ¢2 = #{v0} = 0. 


Threshold Automata with Untrapped Guards. The FloodMin algorithm in Fig. 2 
solves the k-set agreement problem. This algorithm is ran by n replicated pro- 
cesses, up to t of which may fail by crashing. For simplicity of presentation, we 
consider the case when k = 1, which turns k-set agreement into consensus. In 
Fig. 2, on the right, we have the STA that captures the loop body. The locations 
cO and C1 correspond to the case when a process is crashing in the current round 
and may manage to send the value 0 and 1 respectively; the process remains in 
the crashed location “%” and does not send any messages starting with the next 
round. We observe that the guard #{v0,c0} > 0 is not trapped, and our result 
about trapped guards does not apply. Nevertheless, our SMT-based procedure 
can find a diameter of 2. In the same way, we automatically found a bound on the 
diameter for several benchmarks from the literature. It is remarkable that the 
diameter for the transition relation of the loop body (without the loop condition) 
is bounded by a constant, independent of the parameters. 


Bounded Model Checking of Algorithms with Clean Rounds. The number of loop 
iterations |t/k| +1 of the FloodMin algorithm has been designed such that it 
ensures (together with the environment assumption of at most t crashes) that 
there is at least one clean round in which at most k — 1 processes crashed. The 
correctness of the FloodMin algorithm relies on the occurrence of such a clean 
round. We make use of the existence of clean rounds by employing the following 
two-step methodology for the verification of safety properties: (i) we find all 
reachable clean-round configurations, and (ii) check if a bad configuration is 
reachable from those configurations. Detailed description of this methodology 
can be found in Sect.6. Our method requires the encoding of a clean round as 
input (e.g., for Fig.2 that no STA are in CO and C1). We leave detecting and 
encoding clean rounds automatically from the fault environment for future work. 


3 Synchronous Threshold Automata 


We introduce the syntax of synchronous threshold automata and give some intu- 
ition of the semantics, which we will formalize as counter systems below. 
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A synchronous threshold automaton is the tuple STA = (£,7,11,R, RC, x), 
where £ is a finite set of locations, Z C £ is a non-empty set of initial locations, IT 
is a finite set of parameters, R is a finite set of rules, RC is a resilience condition, 
and x is a counter invariant, defined in the following. We assume that the set IT of 
parameters contains at least the parameter n, denoting the number of processes. 
We call the vector m = (m,...,7\m\) the parameter vector, and a vector p = 
(pı, --- Pirr) is an instance of m, where m; € I is a parameter, and p; € N is 
a natural number, for 1 < i < |I|, such that p[7;] = p; is the value assigned to 
the parameter 7; in the instance p of m. The set of admissible instances of m 
is defined as Prc = {p € NI”! | p is an instance of m and p satisfies RC}. The 
mapping N : Pro — N maps an admissible instance p € Pro to the number 
N(p) of processes that participate in the algorithm, such that N(p) is a linear 
combination of the parameter values in p. 

For example, for the STA in Fig.1, RC = n > 3tAt > f, hence a vector 
p € NY! is an admissible instance of the parameter vector m = (n,t, f), if 
pin] > 3p[t] A plt] > p[f]. Furthermore, for this STA, N(p) = p[n] — p[f]. For 
the STA in Fig.2, RC =n > t^t > f, hence the admissible instances satisfy 
pin] > plt] A plt] > p[f], and we have N(p) = p/n]. 

We introduce counter atoms of the form Y = #L > a-m +b, where LCL 
is a set of locations, #L denotes the total number of processes currently in the 
locations £ € L, a € Z!"I is a vector of coefficients, m is the parameter vector, 
and b € Z. We will use the counter atoms for expressing guards and predicates in 
the verification problem. In the following, we will use two abbreviations: #D = 
a-m +b for the formula (#L > a-7+b)A\7A(#L > a:n+b+1), and #L > a-n +b 
for the formula #L > a-m +b+1. 

A rule r € R is the tuple (from, to, p), where from, to € £ are locations, and 
y is a guard whose truth value determines if the rule r is executed. The guard y 
is a Boolean combination of counter atoms. We denote by W the set of counter 
atoms occurring in the guards of the rules r € R. 

The counter invariant x is a Boolean combination of counter atoms #L > 
a:n +b, where each atom occurring in x restricts the number of processes 
allowed to populate the locations in L C L. 


Counter Systems. The counter atoms are evaluated over tuples (x, p), where 
k € NI¢I is a vector of counters, and p € Proc is an admissible instance of 7. 
For a location £ € £, the counter «[¢] denotes the number of processes that are 
currently in the location /. A counter atom Y = #L > a.m +b is satisfied in 
the tuple («,p), that is (k,p) F Y, iff >.<, [4 > a-p +b. The semantics of 
the Boolean connectives is standard. 

A transition is a function t : R — N that maps a rule r € R to a factor 
t(r) € N, denoting the number of processes that act upon this rule. Given an 
instance p of m, we denote by T(p) the set {t | X er t(r) = N(p)} of transitions 
whose rule factors sum up to N(p). 

Given a tuple («,p) and a transition t, we say that t is enabled in (k, p), if 


1. for every r € R, such that t(r) > 0, it holds that (x, p) H} r.p, and 
2. for every l € L, it holds that Klf] = X crear from=e t"). 
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The first condition ensures that processes only use rules whose guards are 
satisfied, and the second that every process moves in an enabled transition. 

Observe that each transition t € T(p) defines a unique tuple («, p) in which 
it is enabled. We call the origin of a transition t € T(p) the tuple o(t) = (K, p), 
such that for every £ € L, we have o(t).«[4] = X rerar.from=¢t(r). Similarly, 
each transition defines a unique tuple («,p) that is the result of applying the 
transition in its origin. We call the goal of a transition t € T(p) the tuple 
g(t) = (k, p), such that for every £ € £, we have g(t).K[€] = Y rerar toze U(r): 

We now define a counter system, for a given STA = (£,7, H, R, RC, x), and 
an admissible instance p € Pro of the parameter vector 7. 


Definition 1. A counter system w.r.t. STA = (£,7,17,R,RC,x) and an 
admissible instance p € Pro is the tuple CS(STA,p) = (X(p), I(p), R(p)), 
where 


- X(p) = {0 = (k,P) | Xec o-4 [4] = N(p) and o — x} are the configurations; 
- I(p) = {0 € X(p) | Soper osil] = N(p)} are the initial configurations; 
- R(p) C (p) x T(p) x X(p) is the transition relation, with (o,t,o') € R(p), 


if o is the origin and o’ the goal of t. We write o > 0’, if (o,t,0’) € R(p). 


We restrict ourselves to deadlock-free counter systems, i.e., counter systems 
where the transition relation is total (every configuration has a successor). A 
sufficient condition for deadlock-freedom is that for every location £ € £, it 
holds that x > Vcr Arfromae T-P- This ensures that it is always possible to 
move out of every location, as there is at least one outgoing rule per location 
whose guard is satisfied. 

To simplify the notation, in the following we write o[f] to denote o.«|¢]. 


Paths and Schedules in a Counter System. We now define paths and schedules 
of a counter system, as sequences of configurations and transitions, respectively. 


Definition 2. A path in the counter system CS(STA, p) = (Xp), I (p), R(p)) 
is a finite sequence {o;}*_9 of configurations, such that for every two consecutive 
configurations o;-1,0;, for0 < i < k, there exists a transition ti E€ T(p) such 


that Ci—ı as ci. A path ee ae is called an execution if co € I(p). 
Definition 3. A schedule is a finite sequence T = {t;}*_, of transitions t; € 
T(p), for0<i<k. We denote by |r| =k the length of the schedule 7. 

A schedule T = {t;}*_, is feasible if there is a path {o;}*_9 such that oi-1 un 
oi, for0 <i < k. We call oo the origin, and op the goal of T, and write oo > op. 


4 Parameterized Reachability and Its Undecidability 


We show that the following problem is undecidable in general, by reduction from 
the halting problem of a two-counter machine (2CM) [28]. Such reductions are 
common in parameterized verification, e.g., see [12]. 
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Definition 4 (Parameterized Reachability). Given a formula y, that is, 
a Boolean combination of counter atoms, and STA = (L,T,1,R,RC,x), the 
parameterized reachability problem is to decide whether there exists an admis- 
sible instance p € Pro, such that in the counter system CS(STA, p), there is 


an initial configuration o € I(p), and a feasible schedule T, with o > o! and 
i 


oF y. 


To prove undecidability, we construct a synchronous threshold automaton 
STAm, such that every counter system induced by it simulates the steps of a 
2CM executing a program P. The STA has a single parameter — the number n of 
processes, and the invariant x = true. The idea is that each process plays one of 
two roles: either it is used to encode the control flow of the program P (controller 
role), or to encode the values of the registers in unary, as in [17] (storage role). 
Thus, STA m consists of two parts — one per each role. 

Our construction allows multiple processes to act as controllers. Since we 
assume that 2CM is deterministic, all the controllers behave the same. For each 
instruction of the program P, in the controller part of STA m, there is a single 
location (for ‘jump if zero’ and ‘halt’) or a pair of locations (for ‘increment’ and 
‘decrement’), and a special stuck location. In the storage part of STA m, there is 
a location for each register, a store location, and auxiliary locations. The number 
of processes in a register location encodes the value of the register in 2CM. 

An increment (resp. decrement) of a register is modeled by moving one pro- 
cess from (resp. to) the store location to (resp. from) the register location. The 
guards on the rules in the controller part check if the storage processes made a 
transition that truly models a step of 2CM; in this case, the controllers move on 
to the next location, otherwise they move to the stuck location. For example, 
to model a ‘jump if zero’ for register A, the controllers check if #{@4} = 0, 
where £4 is the storage location corresponding to register A. The main invari- 
ant which ensures correctness is that every transition in every counter system 
induced by STA m either faithfully simulates a step of the 2CM, or moves all of 
the controllers to the stuck location. 

Let lnay be the halting location in the controller part of STA,,. The for- 
mula y = 7(#{lnau} = 0) states that the controllers have reached the halting 
location. Thus, the answer to the parameterized reachability question given the 
formula y and STAm is positive iff 2CM halts, which gives us undecidability. 


5 Bounded Diameter Oracle 


5.1 Computing the Diameter Using SMT 


Given an STA, the diameter is the maximal number of transitions needed to reach 
all possible configurations in every counter system induced by the STA, and an 
admissible instance p € Pro. We adapt the definition of diameter from [11]. 


Definition 5 (Diameter). Given an STA = (L,T7, H, R, RC, x), the diameter 


is the smallest number d such that for every p € Pro and every path {oi} 445 


Verifying Safety of Synchronous Fault-Tolerant Algorithms by BMC 365 


of length d+ 1 in CS(STA,p), there exists a path {oi }5_o of length e < d in 
CS(STA, p), such that oo = oh and oa41 = 0%. 


Thus, the diameter is the smallest number d that satisfies the formula: 


= , ay / 
Vp € Pro.Voo,..-,0da41- Veli, .-., tapi. dog,..., og. dey, ..., te. 


d 
Path(o0,0a41,d +1) > (00 = 0) A Path(oG, oh, d) A Vie o; = a1 (1) 


where Path(ao, ca, d) is a shorthand for the formula Ne R(ci, ti+1,i+1), and 
R(o,t,o’) is a predicate which evaluates to true whenever o +. ø'. Since we 
assume deadlock-freedom, we are able to encode the path Path(o(,0/,,d) of 
length d, even if the disjunction Ves o! = dayı holds for some i < d. 

Formula (1) gives us the following procedure to determine the diameter: 


initialize the candidate diameter d to 1; 

check if the negation of the formula (1) is unsatisfiable; 
if yes, then output d and terminate; 

if not, then increment d and jump to step 2. 


E 


If the procedure terminates, it outputs the diameter, which can be used 
as completeness threshold for bounded model checking. We implemented this 
procedure, and used a back-end SMT solver to automate the test in step 2. 


5.2 Bounded Diameter for a Fragment of STA 


In this section, we show that for a specific fragment of STA, we are able to give 
a theoretical bound on the diameter, similar to the asynchronous case [20,21]. 
The STA that fall in this fragment are monotonic and 1-cyclic. An STA 
is monotonic iff every counter atom changes its truth value at most once in 
every path of a counter system induced by the STA and an admissible instance 
p € Prc. This implies that every schedule can be partitioned into finitely many 
sub-schedules, that satisfy a property we call steadiness. We call a schedule 
steady if the set of rules whose guards are satisfied does not change in all of its 
transitions. We also give a sufficient condition for monotonicity, using trapped 
counter atoms, defined below. In a 1-cyclic STA, the only cycles that can be 
formed by its rules are self-loops. Under these two conditions, we guarantee that 
for every steady schedule, there exists a steady schedule of bounded length, that 
has the same origin and goal. We show that this bound depends on the counter 
atoms W occurring in the guards of the STA, and the length of the longest path in 
the STA, denoted by c. The main result of this section is stated by the theorem: 


Theorem 1. For every feasible schedule T in a counter system CS(STA, p), 
where STA is monotonic and 1-cyclic, and p € Pro, there exists a feasible 
schedule T’ of length O(|W|c), such that T and T’ have the same origin and goal. 


To prove Theorem 1, we start by defining monotonic STA. 


366 I. Stoilkovska et al. 


Definition 6 (Monotonic STA). An automaton STA = (L,T, I, R, RC,x) 
is monotonic iff for every path {o;}*_9 in the counter system CS(STA, p), for 
p € Pro, and every counter atom ù E€ Y, we have c; = w implies oj = Y, for 
O<i<g<k. 


To show that we can partition a schedule into finitely many sub-schedules, 
we need the notion of a context. A context of a transition t € T(p) is the set 
Ci = {w E€ Y | o(t) H| Y} of counter atoms w satisfied in the origin o(t) of 
the transition t. Given a feasible schedule 7, the point i is a context switch, if 
Cr Æ Cn, for 1 <i < |r}. 


Lemma 1. Every feasible schedule T in a counter system induced by a mono- 
tonic STA has at most |W| context switches. 


Proof. Let T = {t;}*_,; be a feasible schedule and W the set of counter atoms 
appearing on the rules of the monotonic STA. For every % € W, there is at most 
one context switch i, for 0 < i < k, such that  ¢ C;,,_, and Y% E Ca. 


Sufficient Condition for Monotonicity. We introduce trapped counter atoms. 


Definition 7. A set LC L of locations is called a trap, iff for every L€ L and 
every r E R such that = r.from, it holds that r.to € L. 
A counter atom Y = #L>a-n+6 is trapped iff the set L is a trap. 


Lemma 2. Lety = #L > a-n+b be a trapped counter atom, o a configuration 


such that o = Y, and t a transition enabled in o. If o 4 a’, then o' = w. 


Corollary 1. Let STA = (L,I, 1, R,RC,x) be an automaton such that all its 
counter atoms are trapped. Then STA is monotonic. 


Steady Schedules. We define the notion of steadiness, similarly to [20]. 
Definition 8. A schedule T = {t,;}*_, is steady, if C, = Ci;,forO<i<gj<k. 


We now focus on shortening steady schedules. That is, given a steady sched- 
ule, we construct a schedule of bounded length with the same origin and goal. 

Observe that STA = (£,7,11,R,RC,x) can be seen as a directed graph 
Ggra, with vertices corresponding to the locations £ € £, and edges correspond- 
ing to the rules r € R. We denote by c the length of the longest path between 
two nodes in the graph Ggra, and call it the longest chain of STA. If GsTA 
contains only cycles of length one, then STA is called 1-cyclic. 

To shorten steady schedules, in addition to monotonicity, we require that the 
STA are also 1-cyclic. In the following, we assume that the schedules we shorten 
come from counter systems induced by monotonic and 1-cyclic STA. Intuitively, 
if a given schedule is longer than the longest chain of the STA, then in some 
transition of the schedule some processes followed a rule which is a self-loop. 
As processes may follow self-loops at different transitions, we cannot shorten the 
given schedule by eliminating transitions as a whole. Instead, we deconstruct the 
original schedule into sequences of process steps, which we call runs, shorten the 
runs, and reconstruct a new shorter schedule from the shortened runs. The main 
challenge is to show that the newly obtained schedule is feasible and steady. 
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Schedules as Multisets of Runs. We proceed by defining runs and showing that 
each schedule can be represented by a multiset of runs. 

We call a run the sequence 9 = {r;}*_, of rules, for r; € R, such that 
rj.to = ri41-from, for 0 < i < k. We denote by oft] = r; the i-th rule in the 
run o, and by |o| the length of the run. The following lemma shows that a feasible 
schedule can be deconstructed into a multiset of runs. 


Lemma 3. For every feasible schedule r = {t;}*_,, there exists a multiset 
(P,m), where 


1. P is a set of runs o of length k, and 
2. m: P — N is a multiplicity function, such that for every location £ € L, it 
holds that X3, froma=e til?) = X ofiy.fromae (0), for 0 <i < k. 


A multiset (P, m) of runs of length k defines a schedule 7 = {t;}#_, of length 
k, and we have t;(r) = ` ,,4—, ™(@), for every rule r € R and 0 < i < k. 

For the counter systems of STA, which are both monotonic and 1-cyclic, we 
show that their steady schedules can be shortened, so that their length does not 
exceed the longest chain c (that is, the length of the longest path in the STA). 


Lemma 4. Let 7 be a steady feasible schedule in a counter system induced by 
a monotonic and 1-cyclic STA. If |r| > c+1, then there exists a steady feasible 
schedule T’ such such that |r'| = |r|—1, and T, T” have the same origin and goal. 
Proof (Sketch). If r = {t;}**!, with |r| = k +1 > c+1, is a steady schedule, 
then Ca, = C;,, and its prefix 0 = {t;}*_, is a steady and feasible schedule, with 
k > c. By Lemma3, there is a multiset (P, m) of runs of length k describing 0. 
Since k > c, and c is the longest chain in the STA, which is 1-cyclic, it must 
be the case that every run in P contains at least one self-loop. Construct a new 
multiset (P’,m’) of runs of length k — 1, such that each 0’ € P’ is obtained by 
some o € P by removing one occurrence of a self-loop rule. The multiset (P’, m’) 
defines the schedule 6’ = {t,}*7. Because of the monotonicity and steadiness 
of 6, and because we only remove self-loops (which go from and to the same 
location) when we build 6’ from 0, the feasibility is preserved, that is, it holds 
that g(ti_,) = o(t;), for 1 < i < k, and that no guards false in 0 become true 
in 0’. Furthermore, it is easy to check that 6’ has the same origin and goal as 8. 
As the goal of 6’ is the origin of tk+1, construct a schedule 7’ = {t/}*_,, where 
ti = tet. As T is steady, the transitions tı and ty41 have the same contexts. 
From o(t;) = o(t,) and o(t,41) = o(t,), we get that t} and t}, have the same 
contexts, which, together with the monotonicity, implies that 7’ is steady. 


As a consequence of Lemmas 1 and 4, we obtain Theorem 1, which tells us 
that for any feasible schedule, there exists a feasible schedule of length O(|¥|c). 
This bound does not depend on the parameters, but on the number of context 
switches and the longest chain c, which are properties of the STA. 
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6 Bounded Model Checking of Safety Properties 


Once we obtain the diameter bound d (either using the procedure from Sect. 5.1, 
or by Theorem 1), we use it as a completeness threshold for bounded model 
checking. For the algorithms that we verify, we express the violations of their 
safety properties as reachability queries on bounded executions. The length of the 
bounded executions depends on d, and on whether the algorithm was designed 
such that it is assumed that there is a clean round in every execution. 


Checking Safety for Algorithms that do not Assume a Clean Round. Here, we 
search for violations of safety properties in executions of length e < d, by check- 
ing satisfiability of the formula: 


dp € Pro. doo,..., Ce. dti,..., te. Init(oo) A Path(o0,ce,e) A Bad(ae) (2) 


where the predicate Init(a) encodes that ø is an initial configuration, together 
with the constraints imposed on the initial configuration by the safety property, 
and Bad(c) encodes the bad configuration, which, if reachable, violates safety. 

For example, the algorithm in Fig. 1 has to satisfy the safety property unforge- 
ability: If no process sets v to 1 initially, then no process ever sets accept to true. 
In our encoding, we check executions of length e < d, whose initial configuration 
has the counter «[V1] = 0. In a bad configuration, the counter «[Ac] > 0. Thus, 
to find violations of unforgeability, in formula (2), we set: 


Init(o9) = o0[VO] + oo[v1] = N(p) A oo[v1] = 0 
Bad(oe) = o-[AC] > 0 


Checking Safety for Algorithms with a Clean Round. We check for violations 
of safety in executions of length e < 2d, where e = e + e2 such that: (i) we 
find all reachable clean-round configurations in an execution of length e1, for 
eı < d, such that the last configuration ge, satisfies the clean round condition, 
and (ii) we check if a bad configuration is reachable from øe, by a path of length 
e2 < d. That is, we check satisfiability of the formula: 


t1,..., te. Init(oo) A Path(o0, Ce,,€1) 
A Clean(oe,) A Path(Ce,, 0e, e2) A Bad(ae) (3) 


dp € Pro. 300, ..., Ce- 


where the predicate Clean(a) encodes the clean round condition. 

For example, one of the safety properties that the FloodMin algorithm for 
k = 1 (Fig.2) has to satisfy, is k-agreement, which requires that at most k 
different values are decided. In the original algorithm, the processes decide after 
|t/k| +1 rounds, such that at least one of them is the clean round, in which at 
most k — 1 processes crash. In our encoding, we check paths of length e < 2d. 
We enforce the clean round condition by asserting that the sum of counters 
of the locations C0,C1 are k — 1 = 0 in the configuration ce. The property 
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l-agreement is violated if in the last configuration both the counters «[v0] and 
«[V1] are non-zero. That is, to check l-agreement, in formula (3) we set: 


Init(o9) = o0[VO] + go[V1] + o[CO] + co[c1] = N(p) 
Clean(ce,) = Ce, [C0] + ce, [C1] = 0 
Bad(o.) = celv0] > 0 A cefv1] > 0 


7 Experimental Evaluation 


The algorithms that we model using STA and verify by bounded model check- 
ing are designed for different fault models, which in our case are crashes, send 
omissions or Byzantine faults. We now proceed by introducing our benchmarks. 
Their encodings, together with the implementations of the procedures for finding 
the diameter and applying bounded model checking are available at [1]. 


Algorithms without a Clean Round Assumption. We consider three variants of 
the synchronous reliable broadcast algorithm, whose STA are monotonic and 1- 
cyclic (i.e., Theorem 1 applies). These algorithms assume different fault models: 


— rb, [31] (Fig. 1): reliable broadcast with at most t Byzantine faults; 

— rb_hybrid, [10]: reliable broadcast with at most t hybrid faults: at most b 
Byzantine and at most s send omissions, with t = b+ s; 

— rb_omit, [10]: reliable broadcast with at most t send omissions. 


Algorithms with a Clean Round. We encode several algorithms from this class, 
that solve the consensus or k-set agreement problem: 


— fair_cons [30], floodset [26]: consensus with crash faults; 

— floodmin, for k € {1,2} [26] (Fig. 2): k-set agreement with crash faults; 
— kset_omit, for k € {1,2} [30]: k-set agreement with send omission faults; 
— phase _king [8,9], phase_queen [8]: consensus with Byzantine faults. 


These algorithms have a structure similar to the one depicted in Fig. 2, with 
the exception of phase_king and phase_queen. Their loop body consists of sev- 
eral message exchange steps, which correspond to multiple rounds, grouped in a 
phase. In each phase, a designated process acts as a coordinator. 


Computing the Diameter. We implemented the procedure from Sect.5.1 in 
Python. The implementation uses a back-end SMT solver (currently, z3 and 
cvc4). Our tool computed diameter bounds for all of our benchmarks, even for 
those for which we do not have a theoretical guarantee. Our experiments reveal 
extremely low values for the diameter, that range between 1 and 4. The values 
for the diameter and the time needed to compute them are presented in Table 2. 
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Table 2. Results for our benchmarks, available at [1]: |£], |R|, |W], RC are the num- 
ber of locations, rules, atomic guards, and resilience condition in each STA; d is the 
diameter computed using SMT, c is the longest chain of the algorithms whose STA are 
monotonic and 1-cyclic; 7 is the time (in seconds) to compute the diameter using SMT; 
T, SMT is the time to check reachability using the diameter computed using the SMT 
procedure from Sect. 5.1; T, Theorem 1 the time to check reachability using the bound 
obtained by Theorem 1. For the cases where Theorem 1 is not applicable, we write (—). 
The experiments were run on a machine with Intel(R) Core(TM) i5-4210U CPU and 
4GB of RAM, using z3-4.8.1 and cvc4-1.6. 


: T T, SMT T, Thm. 1 
algorithm Iel | IRI |121 RO d z3 | cved | z3 |cvc4 || | z3 | cvc4 
rb 4 8 4 n > 3t 2| 0.27 | 0.99 | 0.08 | 0.08 || 2 | 0.42 | 0.86 
rb hybrid 8 16 | 4 |n >3b+2s|2| 1.16 | 37.6 | 0.09 | 0.15 || 2 | 0.67 | 1.73 
rb_omit 8 16 |4 n > 2t 2| 0.43 | 2.47 | 0.09 | 0.14 || 2 | 0.58 | 1.43 
fair_cons 11 20 | 2 n>t 2| 0.97 | 10.9 | 0.27 | 0.47 || — = = 
floodmin, k=1| 5 9 2 n>t 2| 0.21 | 0.86 | 0.18 | 0.29 || — -= = 
floodmin, k=2| 7 16 | 4 n>t 2| 0.53 | 7.43 | 0.22 | 0.52 || — E = 
floodset T 14 |4 n>t 2| 0.36 | 3.01 | 0.21 | 0.49 || — = = 
kset_omit, k=1| 4 6 2 n>t 1| 0.08 | 0.09 | 0.04 | 0.03 || — -= = 
kset_omit, k=2| 6 12 | 4 n>t 1| 0.17 | 0.27 | 0.04 | 0.07 | — = = 
phase_king 34 72 | 12 n> 3t 4| 12.9 | 50.5 | 1.41 | 5.12 || — = = 
phase_queen 24 42 | 8 n> 4t 3| 1.78 | 17.7 | 0.36 | 1.92 || — z = 


Checking the Algorithms. We have implemented another Python function which 
encodes violations of the safety properties as reachability properties on paths 
of bounded length, as described in Sect. 6, and uses a back-end SMT solver 
to check their satisfiability. Table2 contains the results that we obtained by 
checking reachability for our benchmarks, using the diameter bound computed 
using the procedure from Sect.5.1, and diameter bound from Theorem 1, for 
algorithms whose STA are monotonic and 1-cyclic. 

To our knowledge, we are the first to verify the listed algorithms that work 
with send omission, Byzantine and hybrid faults. For the algorithms with crash 
faults, our approach is a significant improvement to the results obtained using 
the abstraction-based method from [3]. 


Counterexamples. Our tool found a bug in the version of the phase_king algo- 
rithm that was given in [8], which was corrected in the version of the algorithm 
in [9]. The version from [8] had the wrong threshold ‘> n — t in one guard, 
while the one in [9] had ‘> n —? for the same guard. To test our tool, we 
produced erroneous encodings for our benchmarks, and checked them. For rb, 
rb_hybrid, rb_omit, phase_king, and phase_queen, we tweaked the resilience 
condition, and introduced more faults than expected by the algorithm, e.g., by 
setting f > t (instead of f < t) in the STA in Fig. 1. For fair_cons, floodmin, 
floodset, and kset_omit, we checked executions without a clean round. For all 
of the erroneous encodings, our tool produces counterexamples in seconds. 
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8 Discussion and Related Work 


Parameterized verification of synchronous and partially synchronous distributed 
algorithms has recently gained attention. Both models have in common that 
distributed computations are organized in rounds and processes (conceptually) 
move in lock-step. For partially synchronous consensus algorithms, the authors 
of [15] introduced a consensus logic and (semi-)decision procedures. Later, the 
authors of [27] introduced a language for partially synchronous consensus algo- 
rithms, and proved cut-off theorems specialized to the properties of consensus: 
agreement, validity, and termination. Concerning synchronous algorithms, the 
authors of [3] introduced an abstraction-based model checking technique for 
crash-tolerant synchronous algorithms with existential guards. In contrast to 
their work, we allow more general guards that contain linear expressions over 
the parameters, e.g., n — t. Our method offers more automation, and our exper- 
imental evaluation shows that our technique is faster than the technique [3]. 

We introduce a synchronous variant of threshold automata, which were pro- 
posed in [21] for asynchronous algorithms. Several extensions of this model were 
recently studied in [23], but the synchronous case was not considered. STA extend 
the guarded protocols by [16], in which a process can check only if a sum of 
counters is different from 0 or n. Generalizing the results from [16] to STA is 
not straightforward. In [2], safety of finite-state transition systems over infinite 
data domains was reduced to backwards reachability checking using a fixpoint 
computation, as long as the transition systems are well-structured. It would be 
interesting to put our results in this context. A decidability result for liveness 
properties of parameterized timed networks was obtained in [4], employing lin- 
ear programming for the analysis of vector addition systems with a parametric 
initial state. We plan to investigate the use of similar ideas for analyzing liveness 
properties of STA. 

The 1-cyclicity condition is reminiscent of flat counter automata [25]. In 
Fig. 3, we show a possible translation of an STA to a counter automaton (simi- 
lar to the translation for asynchronous threshold automata from [23]). We note 


go; gi; Q2, Q2, $3, $2, 
yo ==, v0 ==, y0'==, Wires, yl'==; Sh ==, SE =<} ACS 
nv0/4++4 NSE! ++ NAC’ +4 NSE’ +4 MAC ++ NSE! ++ MAC! ++ NAC ++ 


Ac = 0, vO! =nv0, vl’ = nyl, sp’ = nse, ac’ = nac, 


a2’ =nvl+nse+ NAC, nvo’ nvl' nse’ nac’ 0 


Fig. 3. A counter automaton for the STA in Fig. 1, with œo =a<t+1,¢.=a+f> 
t+1,¢d=2x+f >n-t, 3 =x <n-—t, where x counts the number of processes in 
locations v1, SE, AC; and n,t, f are counters for the parameters. On a path from so to 
s7, the counters £ € {v0, v1, SE, AC} are emptied, while the counters nf are populated. 
This models the transitions from one location to another in the current round. 
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that the counter automaton is not flat, due to the presence of the outer loop, 
which models a transition to the next round. By knowing a bound d on the 
diameter (e.g., by Theorem 1), one can flatten the counter automaton by unfold- 
ing the outer loop d times. We also experimented with FAST [6] on two of our 
benchmarks: rb and floodmin for k = 1, depicted in Figs. 1 and 2 respectively. 
FAST terminated on rb, but took significantly longer than our tool on the same 
machine (i.e., hours rather than seconds). FAST ran out of memory when check- 
ing floodmin. 

Our experiments show that STA that are neither monotonic, nor 1-cyclic still 
may have bounded diameters. Finding other classes of STA for which one could 
derive the diameter bounds is a subject of future work. Although we considered 
only reachability properties in this work—which happened to be challenging—we 
are going to investigate completeness thresholds for liveness in the future. 
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Abstract. In this paper we introduce a notion of fault-tolerance dis- 
tance between labeled transition systems. Intuitively, this notion of dis- 
tance measures the degree of fault-tolerance exhibited by a candidate 
system. In practice, there are different kinds of fault-tolerance, here we 
restrict ourselves to the analysis of masking fault-tolerance because it 
is often a highly desirable goal for critical systems. Roughly speaking, 
a system is masking fault-tolerant when it is able to completely mask 
the faults, not allowing these faults to have any observable consequences 
for the users. We capture masking fault-tolerance via a simulation rela- 
tion, which is accompanied by a corresponding game characterization. 
We enrich the resulting games with quantitative objectives to define the 
notion of masking fault-tolerance distance. Furthermore, we investigate 
the basic properties of this notion of masking distance, and we prove that 
it is a directed semimetric. We have implemented our approach in a pro- 
totype tool that automatically computes the masking distance between 
a nominal system and a fault-tolerant version of it. We have used this 
tool to measure the masking tolerance of multiple instances of several 
case studies. 


1 Introduction 


Fault-tolerance allows for the construction of systems that are able to over- 
come the occurrence of faults during their execution. Examples of fault-tolerant 
systems can be found everywhere: communication protocols, hardware circuits, 
avionic systems, cryptographic currencies, etc. So, the increasing relevance of 
critical software in everyday life has led to a renewed interest in the automatic 
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verification of fault-tolerant properties. However, one of the main difficulties 
when reasoning about these kinds of properties is given by their quantitative 
nature, which is true even for non-probabilistic systems. A simple example is 
given by the introduction of redundancy in critical systems. This is, by far, 
one of the most used techniques in fault-tolerance. In practice, it is well-known 
that adding more redundancy to a system increases its reliability. Measuring 
this increment is a central issue for evaluating fault-tolerant software, protocols, 
etc. On the other hand, the formal characterization of fault-tolerant properties 
could be an involving task, usually these properties are encoded using ad-hoc 
mechanisms as part of a general design. 

The usual flow for the design and verification of fault-tolerant systems con- 
sists in defining a nominal model (i.e., the “fault-free” or “ideal” program) 
and afterwards extending it with faulty behaviors that deviate from the nor- 
mal behavior prescribed by the nominal model. This extended model represents 
the way in which the system operates under the occurrence of faults. There 
are different ways of extending the nominal model, the typical approach is fault 
injection [20,21], that is, the automatic introduction of faults into the model. 
An important property that any extended model has to satisfy is the preserva- 
tion of the normal behavior under the absence of faults. In [11], we proposed an 
alternative formal approach for dealing with the analysis of fault-tolerance. This 
approach allows for a fully automated analysis and appropriately distinguishes 
faulty behaviors from normal ones. Moreover, this framework is amenable to 
fault-injection. In that work, three notions of simulation relations are defined 
to characterize masking, nonmasking, and failsafe fault-tolerance, as originally 
defined in [15]. 

During the last decade, significant progress has been made towards defining 
suitable metrics or distances for diverse types of quantitative models includ- 
ing real-time systems [19], probabilistic models [12], and metrics for linear and 
branching systems [6,8, 18, 23,29]. Some authors have already pointed out that 
these metrics can be useful to reason about the robustness of a system, a notion 
related to fault-tolerance. Particularly, in [6] the traditional notion of simulation 
relation is generalized and three different simulation distances between systems 
are introduced, namely correctness, coverage, and robustness. These are defined 
using quantitative games with discounted-sum and mean-payoff objectives. 

In this paper we introduce a notion of fault-tolerance distance between labelled 
transition systems. Intuitively, this distance measures the degree of fault-tolerance 
exhibited by a candidate system. As it was mentioned above, there exist differ- 
ent levels of fault-tolerance, we restrict ourselves to the analysis of masking fault- 
tolerance because it is often classified as the most benign kind of fault-tolerance 
and it is a highly desirable property for critical systems. Roughly speaking, a sys- 
tem is masking fault-tolerant when it is able to completely mask the faults, not 
allowing these faults to have any observable consequences for the users. Formally, 
the system must preserve both the safety and liveness properties of the nominal 
model [15]. In contrast to the robustness distance defined in [6], which measures 
how many unexpected errors are tolerated by the implementation, we consider a 
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specific collection of faults given in the implementation and measure how many 
faults are tolerated by the implementation in such a way that they can be masked 
by the states. We also require that the normal behavior of the specification has 
to be preserved by the implementation when no faults are present. In this case, 
we have a bisimulation between the specification and the non-faulty behavior of 
the implementation. Otherwise, the distance is 1. That is, ôm(N, I) = 1 if and 
only if the nominal model N and I\F are not bisimilar, where [\F behaves like 
the implementation J where all actions in F are forbidden (\ is Milner’s restric- 
tion operator). Thus, we effectively distinguish between the nominal model and 
its fault-tolerant version and the set of faults taken into account. 

In order to measure the degree of masking fault-tolerance of a given system, 
we start characterizing masking fault-tolerance via simulation relations between 
two systems as defined in [11]. The first one acting as a specification of the 
intended behavior (i.e., nominal model) and the second one as the fault-tolerant 
implementation (i.e., the extended model with faulty behavior). The existence 
of a masking relation implies that the implementation masks the faults. After- 
wards, we introduce a game characterization of masking simulation and we enrich 
the resulting games with quantitative objectives to define the notion of mask- 
ing fault-tolerance distance, where the possible values of the game belong to 
the interval [0,1]. The fault-tolerant implementation is masking fault-tolerant 
if the value of the game is 0. Furthermore, the bigger the number, the farther 
the masking distance between the fault-tolerant implementation and the spec- 
ification. Accordingly, a bigger distance remarkably decreases fault-tolerance. 
Thus, for a given nominal model N and two different fault-tolerant implemen- 
tations I; and I2, our distance ensures that ôm(N, I1) < dm(N, T2) whenever 
I, tolerates more faults than I2. We also provide a weak version of masking 
simulation, which makes it possible to deal with complex systems composed of 
several interacting components. We prove that masking distance is a directed 
semimetric, that is, it satisfies two basic properties of any distance, reflexivity 
and the triangle inequality. 

Finally, we have implemented our approach in a tool that takes as input a 
nominal model and its fault-tolerant implementation and automatically com- 
pute the masking distance between them. We have used this tool to measure the 
masking tolerance of multiple instances of several case studies such as a redun- 
dant cell memory, a variation of the dining philosophers problem, the bounded 
retransmission protocol, N-Modular-Redundancy, and the Byzantine generals 
problem. These are typical examples of fault-tolerant systems. 

The remainder of the paper is structured as follows. In Sect. 2, we introduce 
preliminaries notions used throughout this paper. We present in Sect.3 the for- 
mal definition of masking distance build on quantitative simulation games and 
we also prove its basic properties. We describe in Sect. 4 the experimental eval- 
uation on some well-known case studies. In Sect. 5 we discuss the related work. 
Finally, we discuss in Sect.6 some conclusions and directions for further work. 
Full details and proofs can be found in [5]. 
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2 Preliminaries 


Let us introduce some basic definitions and results on game theory that will be 
necessary across the paper, the interested reader is referred to [2]. 

A transition system (TS) is a tuple A = (S, X, E, so}, where S is a finite set 
of states, X is a finite alphabet, E C Sx Xx S is a set of labelled transitions, and 
so is the initial state. In the following we use s & s' € E to denote (s,e, s’) € E. 
Let |S| and |E| denote the number of states and edges, respectively. We define 
post(s) = {s € S | s S s' € E} as the set of successors of s. Similarly, 
pre(s') = {s € S | s & s! € E} as the set of predecessors of s’. Moreover, 
post* (s) denotes the states which are reachable from s. Without loss of generality, 
we require that every state s has a successor, i.e., Vs € S : post(s) £0. A run in 
a transition system A is an infinite path p = pgo9P101P202--: € (S - X)” where 
po = So and for all i, p; “+ pi}1 € E. From now on, given a tuple (£o, ..., £n), 
we denote x; by pr;((£o,..-,£n)). 

A game graph G is a tuple G = (S, S1, S2, X, E, so) where S, X, E and so are 
as in transition systems and (S1, S2) is a partition of S. The choice of the next 
state is made by Player 1 (Player 2) when the current state is in Sı (respectively, 
S2). A weighted game graph is a game graph along with a weight function v@ 
from E to Q. A run in the game graph G is called a play. The set of all plays is 
denoted by 2. 

Given a game graph G, a strategy for Player 1 is a function 7 : (S - X)* S1 > 
X x S such that for all pooopio...p; E€ (S - X)*Sı, we have that if 
m(pooopi01---pi) = (o,p), then pi > p € E. A strategy for Player 2 is 
defined in a similar way. The set of all strategies for Player p is denoted by 
Ip. A strategy for player p is said to be memoryless (or positional) if it can 
be defined by a mapping f : Sp — E such that for all s € Sp we have that 
pro(f(s)) = s, that is, these strategies do not need memory of the past his- 
tory. Furthermore, a play 900101202... conforms to a player p strategy m if 
Vi > 0: (pi E Sp) > (Ci, Pi41) = T(P0F0P101--- Pi). The outcome of a Player 
1 strategy 7 and a Player 2 strategy 72 is the unique play, named out(m, 72), 
that conforms to both mı and 79. 

A game is made of a game graph and a boolean or quantitative objective. 
A boolean objective is a function  : N — {0,1} and the goal of Player 1 in a 
game with objective @ is to select a strategy so that the outcome maps to 1, 
independently what Player 2 does. On the contrary, the goal of Player 2 is to 
ensure that the outcome maps to 0. Given a boolean objective ®, a play p is 
winning for Player 1 (resp. Player 2) if (p) = 1 (resp. (p) = 0). A strategy 
m is a winning strategy for Player p if every play conforming to m is winning 
for Player p. We say that a game with boolean objective is determined if some 
player has a winning strategy, and we say that it is memoryless determined if 
that winning strategy is memoryless. Reachability games are those games whose 
objective functions are defined as ®(poo9pi01p202...) = (ai: pi E€ V) for 
some set V C S, a standard result is that reachability games are memoryless 
determined. 
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A quantitative objective is given by a payoff function f : 2 — R and the goal 
of Player 1 is to maximize the value f of the play, whereas the goal of Player 
2 is to minimize it. For a quantitative objective f, the value of the game for a 
Player 1 strategy 71, denoted by v1(71), is defined as the infimum over all the 
values resulting from Player 2 strategies, i.e., v1(71) = infz,em, f(out(m1, T2)). 
The value of the game for Player 1 is defined as the supremum of the values of 
all Player 1 strategies, i.e., sup,,c¢7, vi(71). Analogously, the value of the game 
for a Player 2 strategy m2 and the value of the game for Player 2 are defined 
as v2(™2) = SUPr er, f(out(m,72)) and infr er, V2(T2), respectively. We say 
that a game is determined if both values are equal, that is: supp, erm, v1(™71) = 
infrac V2(72). In this case we denote by val(G) the value of game G. The 
following result from [24] characterizes a large set of determined games. 


Theorem 1. Any game with a quantitative function f that is bounded and Borel 
measurable is determined. 


3 Masking Distance 


We start by defining masking simulation. In [11], we have defined a state- 
based simulation for masking fault-tolerance, here we recast this definition 
using labelled transition systems. First, let us introduce some concepts needed 
for defining masking fault-tolerance. For any vocabulary X, and set of labels 
F ={Fo,...,F,} not belonging to X, we consider Ve = UF, where FAX = Í. 
Intuitively, the elements of F indicate the occurrence of a fault in a faulty 
implementation. Furthermore, sometimes it will be useful to consider the set 
x” = {e | e € X}, containing the elements of X indexed with superscript i. 
Moreover, for any vocabulary X we consider Ym = X U{M}, where M ¢ X, 
intuitively, this label is used to identify masking transitions. 

Given a transition system A = (S, X, E, so) over a vocabulary X, we denote 


AM = (S, 5m, EM, so) where EM = EU {s ies [se S}. 


3.1 Strong Masking Simulation 


Definition 1. Let A = (S, X, E, so) and A’ = (S', iz, E', 89) be two transition 
systems. A’ is strong masking fault-tolerant with respect to A if there exists a 
relation M C S x S’ between AM and A’ such that: 


(A) so M sh, and 

(B) for alls € S,’ € S with sM s and alle € X the following holds: 
(1) f(s t) EE then Ate S':(s' SU ALM); 
(2) if (9 SU) E E then ite S:(sStatM t); 
(3) if (s' 7 t’) for some F € F then Ite s:(s >tAtM?). 


If such relation exists we say that A’ is a strong masking fault-tolerant imple- 
mentation of A, denoted by A <m A’. 
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We say that state s’ is masking fault-tolerant for s when s M ss’. Intuitively, 
the definition states that, starting in s’, faults can be masked in such a way 
that the behavior exhibited is the same as that observed when starting from 
s and executing transitions without faults. In other words, a masking relation 
ensures that every faulty behavior in the implementation can be simulated by the 
specification. Let us explain in more detail the above definition. First, note that 
conditions A, B.1, and B.2 imply that we have a bisimulation when A and A’ 
do not exhibit faulty behavior. Particularly, condition B.1 says that the normal 
execution of A can be simulated by an execution of A’. On the other hand, 
condition B.2 says that the implementation does not add normal (non-faulty) 
behavior. Finally, condition B.3 states that every outgoing faulty transition (F) 
from s’ must be matched to an outgoing masking transition (M) from s. 


3.2 Weak Masking Simulation 


For analysing nontrivial systems a weak version of masking simulation relation 
is needed, the main idea is that a weak masking simulation abstracts away from 
internal behaviour, which is modeled by a special action T. Note that internal 
transitions are common in fault-tolerance: the actions performed as part of a 
fault-tolerant procedure in a component are usually not observable by the rest 
of the system. 

The weak transition relations > C Sx (S{U{r}U{M}UF) x S, also denoted 
as Ew, considers the silent step T and is defined as follows: 


(B)to So(B)* ifeeS, 
= (5)* ife=T, 


$ if e € {M}UF. 


The symbol o stands for composition of binary relations and (~+)* is the reflexive 
and transitive closure of the binary relation 5. 

Intuitively, if e ¢ {r, M} UF, then s $ s’ means that there is a sequence of 
zero or more 7 transitions starting in s, followed by one transition labelled by e, 
followed again by zero or more 7 transitions eventually reaching s’. s 5 s’ states 
that s can transition to s’ via zero or more T transitions. In particular, s > s 
for every s. For the case in which e € {M} UF, s $ s'is equivalent to s & s' 
and hence no 7 step is allowed before or after the e transition. 


Definition 2. Let A = (S, X, E, so) and A’ = (S', iz, E’, sh) be two transition 
systems with X possibly containing T. A’ is weak masking fault-tolerant with 
respect to A if there is a relation M C S x S’ between AM and A’ such that: 


(A) so M sh 

(B) for alls € S,’ € S withsM s’ and alle € XU {r} the following holds: 
(1) if (st) EE thenit'eS': (ss) SUCE,AtMt?); 
(2) if (9 SU) CE’ thn date S:(sSteckEwAtMt); 
(3) if (s' 5 t’) € E' for some F €F then ate S:(s Ste EAtMt?). 
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If such relation exists, we say that A’ is a weak masking fault-tolerant imple- 
mentation of A, denoted by A x A’. 


The following theorem makes a strong connection between strong and weak 
masking simulation. It states that weak masking simulation becomes strong 
masking simulation whenever transition — is replaced by = in the original 
automata. 


Theorem 2. Let A = (S, X, E, so) and A’! = (9, Xz, E's) M C S x S’ 
between A™ and A’ is a weak masking simulation if and only if: 
(A) so M sọ, and 
(B) for alls € S,s' € S with sM s' and alle € XU {r} the following holds: 
(1) if (s $ t) € Ew then ate S: (93 $t € Ey AtM?); 
(2) if (9 $ t) € Ely then ate S:(sSteEwAtMt?); 
(3) if (9 S t) € Ely for some F € F then ate s: (sZ teEw^ntMt) 


The proof of this is straightforward following the same ideas of Milner in [25]. 

A natural way to check weak bisimilarity is to saturate the transition system 
[14,25] and then check strong bisimilarity on the saturated transition system. 
Similarly, Theorem 2 allows us to compute weak masking simulation by reduc- 
ing this problem to compute strong masking simulation. Note that $ can be 
alternatively defined by: 
> poms u>4q 

1 1 

ae z (e ¢ {M} UF) 
p= p p=>q 


i 
Q 


e 
5 


3 
Q 


As a running example, we consider a memory cell that stores a bit of infor- 
mation and supports reading and writing operations, presented in a state-based 
form in [11]. A state in this system maintains the current value of the memory 
cell (m = i, for i = 0,1), writing allows one to change this value, and reading 
returns the stored value. Obviously, in this system the result of a reading depends 
on the value stored in the cell. Thus, a property that one might associate with 
this model is that the value read from the cell coincides with that of the last 
writing performed in the system. 

A potential fault in this scenario occurs when a cell unexpectedly loses its 
charge, and its stored value turns into another one (e.g., it changes from 1 to 
0 due to charge loss). A typical technique to deal with this situation is redun- 
dancy: use three memory bits instead of one. Writing operations are performed 
simultaneously on the three bits. Reading, on the other hand, returns the value 
that is repeated at least twice in the memory bits; this is known as voting. 

We take the following approach to model this system. Labels Wo, W1, Ro, and 
R, represent writing and reading operations. Specifically, Wo (resp. W1): writes 
a zero (resp. one) in the memory. Ro (resp. Ri): reads a zero (resp. one) from 
the memory. Figure1 depicts four transition systems. The leftmost one repre- 
sents the nominal system for this example (denoted as A). The second one from 
the left characterizes the nominal transition system augmented with masking 
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Fig. 1. Transition systems for the memory cell. 


transitions, i.e., AM. The third and fourth transition systems are fault-tolerant 
implementations of A, named A’ and A”, respectively. Note that A’ contains 
one fault, while A” considers two faults. Both implementations use triple redun- 
dancy; intuitively, state to contains the three bits with value zero and tı contains 
the three bits with value one. Moreover, state tz is reached when one of the bits 
was flipped (either 001, 010 or 100). In A”, state t3 is reached after a second bit 
is flipped (either 011 or 101 or 110) starting from state to. It is straightforward 
to see that there exists a relation of masking fault-tolerance between A™ and A’, 
as it is witnessed by the relation M = {(so, to), (si, t1), (So, t2)}. It is a routine 
to check that M satisfies the conditions of Definition 1. On the other hand, there 
does not exist a masking relation between A” and A” because state t3 needs to 
be related to state so in any masking relation. This state can only be reached 
by executing faults, which are necessarily masked with M-transitions. However, 


note that, in state t3, we can read a 1 (transition ts 2 t3) whereas, in state so, 
we can only read a 0. 


3.3 Masking Simulation Game 


We define a masking simulation game for two transition systems (the specifica- 
tion of the nominal system and its fault-tolerant implementation) that captures 
masking fault-tolerance. We first define the masking game graph where we have 
two players named by convenience the refuter (R) and the verifier (V). 


Definition 3. Let A = (S, X, E,soọ) and A’ = (S', Vz, Ew,so). The strong 
masking game graph Gam y = (SC, Sr, Sv, XC, EC, sof) for two players is 
defined as follows: 


- XC = Yy U EF 

- 39 = (8 x (Ey U IZ U {#}) x S' x {R,V}) U {serr} 

- The initial state is s§ = (so, #, sh, R), where the refuter starts playing 
The refuter’s states are Sr = { (s, #,5', R) |s E SAS E S'}U {serr} 


— The verifier’s states are Sy = {(s,0,8',V)|s E SAS €S’ AcE XE\{M}h 
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and EC is the minimal set satisfying: 


z {(s,#,8',R) > =. (t, ot R V) 
~ {(s,#, 5’, R) > zZ, (s, o? ULV V) ocr: St eE} ES, 
) 


JocX:s Ste E} CES, 
- {(s,07,s',V) & (t, "P R)|3ocX:s5teE}CES, 
| 


| 

| 

dre | oO 

- {(s,01,8',V) > AG #,t,R)|deeD:8' SYECEVCES, 

- {(s, F?,8',V) “5 (t,#,5',R) |3 s > te EM} C ES, for any F € F 

- If there is no outgoing transition from some state s then transitions s S Serr 
and Serr S Serr for every o € X, are added. 


The intuition of this game is as follows. The refuter chooses transitions of 
either the specification or the implementation to play, and the verifier tries to 
match her choice, this is similar to the bisimulation game [28]. However, when the 
refuter chooses a fault, the verifier must match it with a masking transition (M). 
The intuitive reading of this is that the fault-tolerant implementation masked 
the fault in such a way that the occurrence of this fault cannot be noticed from 
the users’ side. R wins if the game reaches the error state, i.e., Serr. On the 
other hand, V wins when serr is not reached during the game. (This is basically 
a reachability game [26]). 

A weak masking game graph GW, AM A is defined in the same way as the strong 
masking game graph in Definition 3, with the exception that Xy and Xpy may 
contain 7, and the set of labelled transitions (denoted as EG.) is now defined 
using the weak transition relations (i.e., Ey and Ew) from the respective tran- 
sition systems. 

Figure 2 shows a part of the strong masking game graph for the running 
example considering the transition systems A™ and A”. We can clearly observe 

2 
on the game graph that the verifier cannot mimic the transition (so, #, t3, R) 2, 
(so, R?,t3,V) selected by the refuter which reads a 1 at state t3 on the fault- 
tolerant implementation. This is because the verifier can only read a 0 at state 
so. Then, the Serr is reached and the refuter wins. 

As expected, there is a strong masking simulation between A and A’ if and 
only if the verifier has a winning strategy in Gam 4). 


Theorem 3. Let A = (S, X, E, so) and A’ = (S', Xr, E', 85). A <m A’ iff the 
verifier has a winning strategy for the strong masking game graph Gam A». 


By Theorems 2 and 3, the result replicates for weak masking game. 


Theorem 4. Let A = (S, X U {T}, E, so) and A’ = (S', Xr U {r}, E’, 59). 
A x” A iff the verifier has a winning strategy for the weak masking game 
graph Ge ar 


Using the standard properties of reachability games we get the following 
property. 


Theorem 5. For any A and A’, the strong (resp. weak) masking game graph 
Gam ar (resp. Gr ar) can be Pa in time O(|E@|) (resp. O(|EG,|)). 
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Fig. 2. Part of the masking game graph for memory cell model with two faults 


The set of winning states for the refuter can be defined in a standard way 
from the error state [26]. We adapt ideas in [26] to our setting. For i, 7 > 0, sets 
U? are defined as follows: 


U? =U} = Í, (1) 
Ui ={Serr}, 
te = {v' |v’! E€ Sr A post(v') N Ul #0} 
U {u' | v € Sy A post(v') C Uj; Ui, A^ post(v') NU}, AOA m2(v') ¢ F} 
U {v | v € Sy A post(v') C Upci ire; us A post(v') NUL OA T2(v") € F} 
then U* = Uiso UF and U = U,s9 U”. Intuitively, the subindex i in UF indicates 


that Serr is reach after at most i — 1 faults occurred. The following lemma is 
straightforwardly proven using standard techniques of reachability games [9]. 


Lemma 1. The refuter has a winning strategy in Gam y (or Gara) iff Sinit € 
U*, for some k. 


3.4 Quantitative Masking 


In this section, we extend the strong masking simulation game introduced above 
with quantitative objectives to define the notion of masking fault-tolerance dis- 
tance. Note that we use the attribute “quantitative” in a non-probabilistic sense. 


Definition 4. For transition systems A and A’, the quantitative strong masking 
game graph Q4m 4 = (SC, Sr, Sy, XC, EC, s§,v%) is defined as follows: 
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- Gam ar = (SF, Sp, Sy, XC, ES, sG) is defined as in Definition 3, 


= v(s = 8!) = (XECE), Xsen(8')) 


where xr is the characteristic function over set F, returning 1 ife € F and 0 
otherwise, and Xs.„„ is the characteristic function over the singleton set {Serr}. 


Note that the cost function returns a pair of numbers instead of a single number. 
It is direct to codify this pair into a number, but we do not do it here for the sake 
of clarity. We remark that the quantitative weak masking game graph OM, aris 
defined in the same way as the game graph defined above but using the weak 
masking game graph Gar. ar instead of Gam yr. 

Given a quantitative strong masking game graph with the weight function 
v? and a play p = pooopi01p2,..., for all i > 0, let v; = vS (p; Z pis1). We 
define the masking payoff function as follows: 


: pry (Un) 
m = lim T , 
f (p) ae | +4 D Pro (vi) 


which is proportional to the inverse of the number of masking movements made 
by the verifier. To see this, note that the numerator of mi will be 
1 when we reach the error state, that is, in those paths not reaching the error 
state this formula returns 0. Furthermore, if the error state is reached, then 
the denominator will count the number of fault transitions taken until the 
error state. All of them, except the last one, were masked successfully. The 
last fault, instead, while attempted to be masked by the verifier, eventually 
leads to the error state. That is, the transitions with value (1,_) are those 
corresponding to faults. The others are mapped to (0,_). Notice also that if 
Serr is reached in v,, without the occurrence of any fault, the nominal part of 
the implementation does not match the nominal specification, in which case 
Ti (Un 
posh 
that is, she will try to execute faults leading to the state Serr. In contrast, the 
verifier wants to avoid Serr and then she will try to mask faults with actions 
that take her away from the error state. More precisely, the value of the quan- 
titative strong masking game for the refuter is defined as valr(Qam yx) = 
SUP; pct, Mfryeny fm(out(tR, Tv)). Analogously, the value of the game for 
the verifier is defined as valy (Qam x) = infayemy SUP; pe _ fm(out(TR, TV ))- 
Then, we define the value of the quantitative strong masking game, denoted by 
val(Q4m_4/), as the value of the game either for the refuter or the verifier, i.e., 
val(Q4m_ a) = valy(Qam a) = valy(Qam x). This can be done because quan- 
titative strong masking games are determined as we prove below in Theorem 6. 


= 1. Then, the refuter wants to maximize the value of any run, 


Definition 5. Let A and A’ be transition systems. The strong masking distance 
between A and A’, denoted by dm(A, A’) is defined as: dm(A, A’) = val(Q4m_4/). 


We would like to remark that the weak masking distance 6 is defined in the 
same way for the quantitative weak masking game graph QWu av Roughly 
speaking, we are interesting on measuring the number of faults that can be 
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masked. The value of the game is essentially determined by the faulty and mask- 
ing labels on the game graph and how the players can find a strategy that leads 
(or avoids) the state Serr, independently if there are or not silent actions. 

In the following, we state some basic properties of this kind of games. As 
already anticipated, quantitative strong masking games are determined: 


Theorem 6. For any quantitative strong masking game Qam a with payoff 
function fim: 

infryemy SUPz pci, Jm(out(mR, TV)) = SUPzpem, Mfayeny fm(out(tR, Ty )) 
The value of the quantitative strong masking game can be calculated as stated 
below. 

Theorem 7. Let Qam a be a quantitative strong masking game. Then, 
val(Qam x) = <, with w = min{i | 3j : Sini € UZ}, whenever sinit € U, 
and val(Q,4m_4:) = 0 otherwise, where sets U} and U are defined in Eq. (1). 


Note that the sets Us can be calculated using a bottom-up breadth-first search 
from the error state. Thus, the strategies for the refuter and the verifier can 
be defined using these sets, without taking into account the history of the play. 
That is, we have the following theorems: 


Theorem 8. Players R and V have memoryless winning strategies for Qam ar. 


Theorems 6, 7, and 8 apply as well to OW ar The following theorem states the 
complexity of determining the value of the two types of games. 


Theorem 9. The quantitative strong (weak) masking game can be determined 
in time O(|S°| + |E°|) (resp. O(|S°| + |EG|)). 


Theorems 5 and 9 describe the complexity of solving the quantitative and stan- 
dard masking games. However, in practice, one needs to bear in mind that 
|SS| = |S| * |S’| and |E°| = |E| + |E’|, so constructing the game takes 
O(|S|? x |S’|?) steps in the worst case. Additionally, for the weak games, the 
transitive closure of the original model needs to be computed, which for the best 
known algorithm yields O(max(|S|, |$”|)?3"2") [30]. 

By using Qiu, At instead of Q AM ar in Definition 5, we can define the weak 
masking distance 6“. The next theorem states that, if A and A’ are at distance 
0, there is a strong (or weak) masking simulation between them. 


Theoren 10, Por any iamenvon sysenis A gud A, ien (1) Onl eet = 0 
A <m A’, and (ü) 50 (A, A!) =0 if A 38 A’. 


This follows from Theorem7. Noting that A <m A (and A <% A) for any 
transition system A, we obtain that ôm(A, A) = 0 (resp. 6&¥ (A, A) = 0) by 
Theorem 10, i.e., both distance are reflexive. 

For our running example, the masking distance is 1/3 with a redundancy of 
3 bits and considering two faults. This means that only one fault can be masked 
by this implementation. We can prove a version of the triangle inequality for our 
notion of distance. 


Measuring Masking Fault-Tolerance 387 


Theorem 11. Let A = (S,X,E,50), A’ = (S, Xr, E',sọ), and A” = 
(S", Meu, E", sg) be transition systems such that F' C F". Then bm (A, A”) < 
Om (A, A’) + bm(A’, A”) and 86W (A, A") < W (A, A’) + W (A’, A”). 


Reflexivity and the triangle inequality imply that both masking distances are 
directed semi-metrics [7,10]. Moreover, it is interesting to note that the triangle 
inequality property has practical applications. When developing critical software 
is quite common to develop a first version of the software taking into account 
some possible anticipated faults. Later, after testing and running of the system, 
more plausible faults could be observed. Consequently, the system is modified 
with additional fault-tolerant capabilities to be able to overcome them. Theo- 
rem 11 states that incrementally measuring the masking distance between these 
different versions of the software provides an upper bound to the actual distance 
between the nominal system and its last fault-tolerant version. That is, if the 
sum of the distances obtained between the different versions is a small number, 
then we can ensure that the final system will exhibit an acceptable masking 
tolerance to faults w.r.t. the nominal system. 


4 Experimental Evaluation 


The approach described in this paper has been implemented in a tool in Java 
called MaskD: Masking Distance Tool [1]. MaskD takes as input a nominal model 
and its fault-tolerant implementation, and produces as output the masking dis- 
tance between them. The input models are specified using the guarded com- 
mand language introduced in [3], a simple programming language common for 
describing fault-tolerant algorithms. More precisely, a program is a collection of 
processes, where each process is composed of a collection of actions of the style: 
Guard — Command, where Guard is a boolean condition over the actual state 
of the program and Command is a collection of basic assignments. These syn- 
tactical constructions are called actions. The language also allows user to label 
an action as internal (i.e., 7 actions). Moreover, usually some actions are used 
to represent faults. The tool has several additional features, for instance it can 
print the traces to the error state or start a simulation from the initial state. 

We report on Table 1 the results of the masking distance for multiple instances 
of several case studies. These are: a Redundant Cell Memory (our running exam- 
ple), N-Modular Redundancy (a standard example of fault-tolerant system [27]), 
a variation of the Dining Philosophers problem [13], the Byzantine Generals 
problem introduced by Lamport et al. [22], and the Bounded Retransmission 
Protocol (a well-known example of fault-tolerant protocol [16]). 

Some words are useful to interpret the results. For the case of a 3 bit mem- 
ory the masking distance is 0.333, the main reason for this is that the faulty 
model in the worst case is only able to mask 2 faults (in this example, a fault 
is an unexpected change of a bit value) before failing to replicate the nomi- 
nal behaviour (i.e. reading the majority value), thus the result comes from the 
definition of masking distance and taking into account the occurrence of two 
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faults. The situation is similar for the other instances of this problem with more 
redundancy. 

N-Modular-Redundancy consists of N systems, in which these perform a pro- 
cess and that results are processed by a majority-voting system to produce a sin- 
gle output. Assuming a single perfect voter, we have evaluated this case study 
for different numbers of modules. Note that the distance measures for this case 
study are similar to the memory example. 

For the dining philosophers problem we have adopted the odd/even philoso- 

phers implementation (it prevents from deadlock), i.e., there are n — 1 even 
philosophers that pick the right fork first, and 1 odd philosopher that picks the 
left fork first. The fault we consider in this case occurs when an even philoso- 
pher behaves as an odd one, this could be the case of a byzantine fault. For two 
philosophers the masking distance is 0.5 since a single fault leads to a deadlock, 
when more philosophers are added this distance becomes smaller. 
Another interesting example of a fault-tolerant system is the Byzantine gen- 
erals problem, introduced originally by Lamport et al. [22]. This is a consensus 
problem, where we have a general with n — 1 lieutenants. The communication 
between the general and his lieutenants is performed through messengers. The 
general may decide to attack an enemy city or to retreat; then, he sends the order 
to his lieutenants. Some of the lieutenants might be traitors. We assume that 
the messages are delivered correctly and all the lieutenants can communicate 
directly with each other. In this scenario they can recognize who is sending a 
message. Faults can convert loyal lieutenants into traitors (byzantines faults). As 
a consequence, traitors might deliver false messages or perhaps they avoid send- 
ing a message that they received. The loyal lieutenants must agree on attacking 
or retreating after m + 1 rounds of communication, where m is the maximum 
numbers of traitors. 

The Bounded Retransmission Protocol (BRP) is a well-known industrial case 
study in software verification. While all the other case studies were treated as toy 


Table 1. Results of the masking distance for the case studies. 


Masking . Masking} „n. 

Case Study |Redundancy Distänce Time Case Study|Redundancy Distance Time 
3 bits 0.333 0.7s 1 retransm. | 0.333 1.28 

5 bits 0.25 1.5s 3 retransm. 0.2 1.48 

Memory 7 bits 0.2 27s BRP(1) 5 retransm. | 0.143 | 1.5s 
9 bits 0.167 |34m33s 7 retransm. | 0.111 2.1s 

3 modules | 0.333 0.38 1 retransm. | 0.333 5.58 

N-Modular | 5 modules 0.25 0.5s BRP(3) 3 retransm. 0.2 14.9s 
Redundancy | 7 modules 0.2 31.78 Í 5 retransm. | 0.143 |1m28s 
9 modules | 0.167 115m 7 retransm. | 0.111 |4m40s 

2 phils 0.5 0.38 1 retransm.| 0.333 | 6.7s 

: . 3 phils 0.333 0.65s zy |3retransm.| 0.2 328 
a as 4 phils 0.25 7.1s BRP(5) 5 retransm. | 0.143 |Im51s 
5 phils 0.2 |13m.53s 7 retransm. | 0.111 |6m35s 

Byzantines 3 generals 0.5 0.5s 


4 generals | 0.333 2s 
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examples and analyzed with ôm, the BRP was modeled closer to the implemen- 
tation following [16], considering the different components (sender, receiver, and 
models of the channels). To analyze such a complex model we have used instead 
the weak masking distance 6)”. We have calculated the masking distance for 
the bounded retransmission protocol with 1, 3 and 5 chunks, denoted BRP(1), 
BRP(3) and BRP(5), respectively. We observe that the distance values are not 
affected by the number of chunks to be sent by the protocol. This is expected 
because the masking distance depends on the redundancy added to mask the 
faults, which in this case, depends on the number of retransmissions. 

We have run our experiments on a MacBook Air with Processor 1.3 GHz 
Intel Core i5 and a memory of 4 Gb. The tool and case studies for reproducing 
the results are available in the tool repository. 


5 Related Work 


In recent years, there has been a growing interest in the quantitative general- 
izations of the boolean notion of correctness and the corresponding quantitative 
verification questions [4,6,17,18]. The framework described in [6] is the clos- 
est related work to our approach. The authors generalize the traditional notion 
of simulation relation to three different versions of simulation distance: cor- 
rectness, coverage, and robustness. These are defined using quantitative games 
with discounted-sum and mean-payoff objectives, two well-known cost functions. 
Similarly to that work, we also consider distances between purely discrete (non- 
probabilistic, untimed) systems. Correctness and coverage distances are con- 
cerned with the nominal part of the systems, and so faults play no role on them. 
On the other hand, robustness distance measures how many unexpected errors 
can be performed by the implementation in such a way that the resulting behav- 
ior is tolerated by the specification. So, it can be used to analyze the resilience 
of the implementation. Note that, robustness distance can only be applied to 
correct implementations, that is, implementations that preserve the behavior 
of the specification but perhaps do not cover all its behavior. As noted in [6], 
bisimilarity sometimes implies a distance of 1. In this sense a greater grade of 
robustness (as defined in [6]) is achieved by pruning critical points from the 
specification. Furthermore, the errors considered in that work are transitions 
mimicking the original ones but with different labels. In contrast to this, in our 
approach we consider that faults are injected into the fault-tolerant implementa- 
tion, where their behaviors are not restricted by the nominal system. This follows 
the idea of model extension in fault-tolerance where faulty behavior is added to 
the nominal system. Further, note that when no faults are present, the mask- 
ing distance between the specification and the implementation is 0 when they 
are bisimilar, and it is 1 otherwise. It is useful to note that robustness distance 
of [6] is not reflexive. We believe that all these definitions of distance between 
systems capture different notions useful for software development, and they can 
be used together, in a complementary way, to obtain an in-depth evaluation of 
fault-tolerant implementations. 
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6 Conclusions and Future Work 


In this paper, we presented a notion of masking fault-tolerance distance between 
systems built on a characterization of masking tolerance via simulation rela- 
tions and a corresponding game representation with quantitative objectives. Our 
framework is well-suited to support engineers for the analysis and design of fault- 
tolerant systems. More precisely, we have defined a computable masking distance 
function such that an engineer can measure the masking tolerance of a given 
fault-tolerant implementation, i.e., the number of faults that can be masked. 
Thereby, the engineer can measure and compare the masking fault-tolerance 
distance of alternative fault-tolerant implementations, and select one that fits 
best to her preferences. 

There are many directions for future work. We have only defined a notion of 
fault-tolerance distance for masking fault-tolerance, similar notions of distance 
can be defined for other levels of fault-tolerance like failsafe and non-masking. 
Also, we have focused on non-quantitative models. However, metrics defined on 
probabilistic models, where the rate of fault occurrences is explicitly represented, 
could give a more accurate notion of fault tolerance. 
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Abstract. Static program analysis is used to automatically determine 
program properties, or to detect bugs or security vulnerabilities in pro- 
grams. It can be used as a stand-alone tool or to aid compiler optimiza- 
tion as an intermediary step. Developing precise, inter-procedural static 
analyses, however, is a challenging task, due to the algorithmic complex- 
ity, implementation effort, and the threat of state explosion which leads 
to unsatisfactory performance. Software written in C and C++ is noto- 
riously hard to analyze because of the deliberately unsafe type system, 
unrestricted use of pointers, and (for C++) virtual dispatch. In this 
work, we describe the design and implementation of the LLVM-based 
static analysis framework PhASAR for C/C++ code. PhASAR allows 
data-flow problems to be solved in a fully automated manner. It pro- 
vides class hierarchy, call-graph, points-to, and data-flow information, 
hence requiring analysis developers only to specify a definition of the 
data-flow problem. PhASAR thus hides the complexity of static analysis 
behind a high-level API, making static program analysis more accessible 
and easy to use. PhASAR is available as an open-source project. We 
evaluate PhASAR’s scalability during whole-program analysis. Analyz- 
ing 12 real-world programs using a taint analysis written in PhASAR, 
we found PhASAR’s abstractions and their implementations to provide a 
whole-program analysis that scales well to real-world programs. Further- 
more, we peek into the details of analysis runs, discuss our experience 
in developing static analyses for C/C++, and present possible future 
improvements. Data or code related to this paper is available at: [34]. 


Keywords: Inter-procedural static analysis + LLVM - C/C++ 


1 Introduction 


Programming languages from the C/C++ family are chosen as the implemen- 
tation language in a multitude of projects especially in cases where a direct 
interface with the operating system or hardware components is of importance. 
Large portions of any operating system and virtual machine (such as the Java 
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VM) are written in C or C++. The reason for this is oftentimes the amount 
of control the programmer has over many aspects that allow for the creation 
of very efficient programs—but also comes with the obligation to use these fea- 
tures correctly to avoid introducing bugs or opening the program to security 
vulnerabilities. 

To aid developers in creating correct and secure software, a multitude of 
checks have been included into compilers such as GCC [4] and Clang [2]. Var- 
ious additional tools such as Cppcheck [12], clang-tidy [9], or the Clang Static 
Analyzer [8] provide additional means to check for unwanted behavior. Compiler- 
check passes and additional checkers both use static program analysis to provide 
warnings to their users. However, to create warnings in a timely fashion, these 
tools use comparatively simple analyses that provide either only checks for sim- 
ple properties, or suffer from a large number of false or missed warnings, due to 
the imprecision or unsoundness of the used analysis. 

For programs written in Java, program-analysis frameworks like Soot [16], 
WALA [33], and Doop [13] are available which allow for a more precise data-flow 
analysis to determine more intricate program problems. Furthermore, algorith- 
mic frameworks such as Interprocedural Finite Subset (IFDS) [24], Interpro- 
cedural Distributive Environments (IDE) [26], or Weighted Pushdown Systems 
(WPDS) [25] can be used to describe dataflow problems and efficiently compute 
their possible solutions. 

So far, such implementations have not been openly available for programs 
written in C/C++. This work thus presents the novel program-analysis frame- 
work PhASAR, an extension to the LLVM compiler infrastructure [17]. In its 
inception, we used our experience in developing previous such frameworks for 
JVM-based languages, namely Soot [16] and OPAL [14], to design a flexible 
framework that can be adapted to several different types of client analyses. 
Besides solving data-flow problems, PhASAR can be used to achieve other 
related goals as well, for instance, call-graph construction, or the computation 
of points-to information. Its features can be used independently and be included 
into other software. PhASAR’s implementation is written entirely in C++ and 
is available as open source under the permissive MIT license [23]. 

PhASAR is intended to be used as a static analyzer. Therefore, it does not 
substitute but complement features from the LLVM toolchain and provides also 
for analyses which during compilation would be prohibitively expensive. 

This paper makes the following contributions: 


— It provides a user-centric description of PhASAR’s architecture, its infras- 
tructure, and data-flow solvers, 

— it presents a case-study that shows PhASAR’s overall scalability as well as 
the precise runtimes of a concrete static analysis, and 

— it discusses our experience in developing static analyses for C/C++. 


2 Related Work 


There are several established and well-maintained tools and frameworks for the 
Java ecosystems. Frameworks from academia include Soot [16], which is a static 
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analysis framework that allows call-graph construction, computation of points- 
to information and solving of data-flow problems for Java and Android. Soot 
does not support inter-procedural data-flow analyses directly. However, a user 
can solve such problems using the Heros [7] extension that implements an IFD- 
S/IDE solver. The WALA [33] framework provides similar functionalities for Java 
bytecode, JavaScript and Python. OPAL [14] allows for the implementation of 
abstract interpretations of Java bytecode. Also the manipulation of bytecode is 
supported. A declarative approach is implemented by the Doop framework [13]. 
Doop uses a declarative rule set to encode an analysis and solves it using the 
logic-based Datalog solver. The framework allows for pointer analysis of Java 
programs and implements a range of algorithms that can be used for context 
insensitive, call-site and object sensitive analyses. 

Tooling for C/C++ includes Cppcheck [12] which aims for a result without 
false positives and allows to encode simple rules as well as the development 
of more powerful add-ons. The clang-tidy tool [9] provides built-in checks for 
style validation, detection of interface misuse as well as bug-finding using simple 
rules, but can be extended by a user. Checks can be written on preprocessor level 
using callbacks or on AST level using AST matchers that can be specified using 
an embedded domain specific language (EDSL). The Clang Static Analyzer [8] 
uses symbolic execution and allows custom checks to be written. The SVF [31] 
framework computes points-to information for constructing sparse value flow and 
memory static single assignment (SSA). Hence, it can be used for analyses that 
rely on those information such as memory leak detection or null pointer analysis. 
Additionally, more precise pointer analysis can be build on top of SVF’s results. 
However, as the computation of memory SSA does require a significant amount 
of computation, using SVF may not pay off for problems that can be encoded 
using distributive frameworks, which allow fast, summary-based solutions. 

There are also commercial, closed-source tools for static analysis such as 
CodeSonar [10] and Coverity [11], both of which support analyses for C, C++, 
Java and other languages. Whereas these products are attractive to industry as 
they provide polished user interfaces, they are not usable for evaluating novel 
algorithms and ideas in static-analysis research. 


3 Data-Flow Analysis 


Data-flow analysis is a form of static analysis which works by propagating infor- 
mation about the property of interest—the data-flow facts—through a model of 
the program, typically a control-flow graph, and captures the interactions of the 
flow facts with the program. The interaction of a single statement s with a data- 
flow fact is described by a flow function. There are two orthogonal approaches [27] 
that can be used in order to solve inter-procedural (whole program) data-flow 
problems: the call-strings and functional approach. For the call-strings approach 
we refer the reader to related work [15,27]. In the following we briefly present 
the functional approach using a linear constant propagation that we apply to 
a small program shown in Listing 1.1. A linear constant propagation is a data- 
flow analysis that precisely tracks variables with constant values and variables 
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that linearly (c = a- x + b, with a,b constant values) depend on constant val- 
ues through the program. Non-linear dependencies are over-approximated. In 
our example, we restrict the analysis to keep track of integer constants only. 
Such an analysis can be used to perform program optimizations by replacing 
variables with their constant values, and folding expressions that use constant 
values, eventually possibly also removing dead code. The analysis would be able 
to optimize the program shown in Listing 1.1 to int main() {return 12; }. 


int inc(int p) { return ++p; } 
int main() { 


return c; 
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Listing 1.1. Program P 


If the flow functions of the problem to be solved are monotone and distribu- 
tive over the merge operator, it can be encoded using Inter-procedural Finite 
Distributive Subset (IFDS) or Inter-procedural Distributive Environments (IDE). 
Unlike the call-string approach which is limited to a certain level of context- 
sensitivity (commonly denoted as k), IFDS and IDE are fully context-sensitive, 
i.e., k = oo. In IFDS [24] and its generalization IDE [26], a data-flow problem 
is transformed into a graph reachability problem. The reachability is computed 
using the so called exploded super-graph (ESG). If a node (s;,d;) in the ESG 
is reachable from a special tautological node A, the data-flow fact represented 
by d;i holds at statement s;. The ESG is built according to the flow functions 
which can be represented as bipartite graphs. Functions for generating (Gen) and 
destroying (Kil1) data-flow facts can be encoded into flow functions making the 
framework compatible to more traditional approaches to data-flow analysis. The 
composition f o g of two functions can be computed by composing their corre- 
sponding bipartite graphs, i.e., merging the nodes of g with the corresponding 
nodes of the domain of f. The ESG for the complete program is constructed 
by replacing every node of the inter-procedural control-flow graph (ICFG) with 
the graph representation of the corresponding flow function. Scalability issues 
due to context-sensitivity are mitigated through summaries that are computed 
by composition of all bipartite graphs of a function for a given input. These 
summaries are reused for subsequent calls to an already summarized function. 

The complexity of the IFDS algorithm is O(|N| - |D|*) where |N] is the 
number of nodes on the ICFG (or number of program statements) and |D| the 
size of the data-flow domain that is used. To make the analysis scale, the domain 
D should thus be kept small. 

In IDE, a generalization of the IFDS framework, the edges of the ESG are 
additionally annotated with so-called edge functions. With the help of those 
edge functions, an additional value-computation problem can be encoded, which 
is solved while performing the reachability computation. The complexity of the 
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IDE algorithm is the same as for the IFDS algorithm. Many problems can be 
solved more efficiently by encoding them with IDE rather than IFDS, because 
IDE uses two domains to solve a given problem. In addition to the domain D of 
the data-flow facts, the value computation problem is formulated over a second 
value domain V, which can be large, even infinite. Crucially, for a given fixed-size 
program, the complexity of both IFDS and IDE depends only on the size of D. 

Let us consider a linear constant propagation to be performed on the example 
program shown in Listing 1.1. Using IFDS, the data-flow domain can be encoded 
by using pairs of V x Z program variables and integer values. However, this 
strategy leads to a huge domain D and prevents the generation of effective 
summaries. For each call to inc() in Listing 1.1 with a different input value a, a 
new summary must be generated. In the example, we would obtain summaries 
{(p, 1)  (<ret>,2)} for call site cs1 and {(p,2) + (<ret>,3)} for cs2. 

With IDE, the problem can be encoded in a more elegant and efficient way, 
by using V as the data-flow domain and Z as the value domain. The ESG for 
a linear constant propagation performed on Listing 1.1 using IDE is shown in 
Fig. 1. As the context-dependent part of the analysis is encoded using the edge 
functions, only one summary is generated for the inc() function, Ax. x + 1. 

Performing a reachability check on the ESG for variable c at line 9, one finds 
that c can be replaced by the literal 12. Because the return statement is the 
program’s only observable effect, all other statements can be safely removed. 


int main() vi) 
ale 
Ae- ae be ce ° 
Al LF -7 
int a = 1; +p; a 7 
int b = 2; 
int c = 3; 
a = inc(a); 


b = inc(b); 


c=b*»x 4; 


return c; 


Fig. 1. Exploded super graph for the program P in Listing 1.1 
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4 Architecture 


Precise data-flow analysis requires information from multiple supporting analy- 
ses which are typically run earlier, such as class-hierarchy, call-graph, and points- 
to analysis. Algorithmic frameworks like IFDS provide a generalized algorithm 
that is then parameterized for each individual data-flow problem. The infrastruc- 
ture provided by these basic analyses and algorithmic frameworks is necessary 
to allow analysis designers to efficiently concentrate on the goal of a data-flow 
analysis. PhASAR is the first framework to provide such infrastructure for pro- 
grams written in the C/C++ language family. Its infrastructure is designed 
modularly, such that analysis developers can choose the components necessary 
for their individual goals. In Fig. 2 we present the high-level architecture of the 
framework. 
We allow PhASAR to be used in 
multiple ways. The first (and easi- Tool A Tool B 
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est) way is through its command- | $ 
. . . . = ULI 
line interface. Its implementation | & SA 
7 $ _ fea 
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analyses such as call-graph con- [LVM API 


struction or pointer analysis or run 
pre-defined IFDS/IDE-based anal- Fig. 2. PhASAR’s high-level architecture 
yses. The output of these analyses 

can then be processed using other tooling or presented to the user directly. 

The command-line interface can also be extended with custom analyses, pro- 
vided as separately compiled plugins. Currently, custom control-flow or call- 
graph analyses and custom data-flow analyses can be packaged in this way. 
The command-line interface acts as the runtime for these plugins and delegates 
control to the plugin at the appropriate times providing necessary information. 
Plugin providers need to create an implementation of a pre-defined C++ class 
wrapping their analysis code. The plugin is compiled separately and then pro- 
vided to PhASAR in form of a shared object library. 

PhASAR can also be included into other tools by using it as a library. This 
way of using PhASAR provides the most flexibility as developers can freely select 
the components that should be part of an analysis and can reuse even parts of 
the components provided by the framework. 

PhASAR allows analysis developers to specify arbitrary data-flow problems, 
which are then solved in a fully-automated manner on the specified LLVM IR 
target code. Solving a static analysis problem on the IR rather than the source 
language makes the analysis generally easier. This is because it removes the 
dependency on the concrete source language, as the IR is usually simpler since 
the IR involves no nesting and has fewer instructions. Various compiler front- 
ends for a wide range of languages targeting LLVM IR exist. Hence, PhASAR 
is able to analyze programs written in languages other than C/C++, too. The 
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framework computes all required information to perform an analysis such as 
points-to, call-graph, type-hierarchy as well as additional parameterizable taint 
and typestate analyses. 

PhASAR provides various capabilities and interfaces to compute data-flow 
problems or aid other types of analyses. First, the framework contains inter- 
faces and implementations for the computation of an ICFG; we provide some 
parameterizable implementations for the LLVM IR. 

Next, PhASAR currently supports the computation of function-wise points- 
to information using LLVM’s implementations of the Andersen-style [6] or 
Steensgard-style [30] algorithms. Points-to information and ICFG computation 
can be combined to obtain more precise results. We discuss the quality of points- 
to information and our current efforts to improve their quality in Sect. 8. 

To resolve virtual function calls in C++, we provide means to construct a 
type hierarchy. We construct the type hierarchy for composite types and recon- 
struct the virtual-method tables from the IR, which together with the hierarchy 
information allow PhASAR to resolve potential call targets at a given call-site. 

PhASAR provides implementations of IDE and IFDS solvers as described by 
Reps et al. [24] including the extensions of Naeem et al. [20]. We implemented 
IFDS as a specialization of IDE using a binary lattice only using a top and 
a bottom element much alike the Heros implementation [7]. Both solvers are 
accompanied by a corresponding interface for problem definition. To solve a 
data-flow problem using the IDE or IFDS solver, the data-flow problem must be 
encoded by implementing this interface. We present this in detail in Sect. 5. 

For non-distributive data-flow problems PhASAR, provides an implementa- 
tion of the traditional monotone framework which allows one to solve intra- 
procedural problems. The framework provides an inter-procedural version as 
well that uses a user-specified context in order to differentiate calling-contexts. 
PhASAR provides a context interface and implementations of this interface that 
realize the call-strings and value-based approach VASCO [22], in which context- 
sensitivity is achieved by reusing information that has been computed for previ- 
ous calls under the same context. The framework also implements a version of 
the context class to represent a null context. This context has the same effect as 
applying the monotone framework directly in an inter-procedural setting. Both 
solvers are accompanied by corresponding interfaces for problem descriptions 
which must be implemented to encode the data-flow problem. The details are 
provided in Sect. 5. 

All of PhASAR’s data-flow solvers are implemented in a fully generic manner 
and heavily make use of templates and interfaces. For instance, a solver follows 
a target program’s control-flow that is specified through an implementation of 
either the CFG or the ICFG interface. Analysis developers can parameterize a 
solver with an existing implementation or they can provide their own custom 
implementation. They can run a forward or backward analysis depending on 
the direction of the chosen control-flow graph. Moreover, all data-flow related 
functionality is hidden behind interfaces. A solver queries the required func- 
tionality such as flow functions or merge operations for the underlying lattice 
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whenever necessary. We have specified problem interfaces on which the corre- 
sponding solver operates. Thus, analysis developers encode their data-flow prob- 
lem by providing an implementation for the problem interface and provide this 
implementation to the accompanying solver. PhASAR is able to solve a problem 
on other IRs when suitable implementations for the IR specific parts such as 
the control-flow graphs and problem descriptions are provided by the analysis 
developer. 


5 Implementation 


Our goal with PhASAR is easing the formulation of a data-flow analysis such 
that an analysis developer only needs to focus on the implementation of the 
problem description rather than providing details how the problem is solved. 

PhASAR achieves parts of its generalizability through template parameters. 
These template parameters include, among others, N, D, M. They are consistently 
used throughout the implementation of PhASAR. N denotes the type of a node in 
the ICFG, i.e., typically an IR statement, D denotes the domain of the data-flow 
facts, and M is a placeholder for the type of a method/function. When analyzing 
LLVM IR, N is always of type const 1lvm::Instruction* and Mis of type const 
llvm: :Function*, whereas D depends on the specific data-flow analysis that the 
developer wants to encode. For our example using linear constant propagation 
described in Sect.3, D = pair<const llvm::Value *, int> could be used to 
capture the property of interest. LLVM’s Value type is quite useful as it is a 
super-type that is located high in the type hierarchy. This allows an analysis 
developer to use values of all of Value’s subtypes in the value domain, which 
makes it highly flexible. 


5.1 Encoding an IFDS Analysis 


Listing 1.2 shows the interface for an IFDS problem. An analysis developer has to 
define a new type—the problem description—implementing the FlowFunctions 
interface. 


template <typename N, typename D, typename M> struct FlowFunctions { 

virtual ~FlowFunctions() = default; 
virtual FlowFunction<D> *getNormalFlowFunction(N curr, N succ) = 0; 
virtual FlowFunction<D> *getCallFlowFunction(N callStmt , 

M destMthd) = 0; 
virtual FlowFunction<D> *getRetFlowFunction(N callSite , 

M calleeMthd , 

N exitStmt , 

N retSite) = 0; 
virtual FlowFunction<D> * 
getCallToRetFlowFunction(N callSite , N retSite , set<M> callees) = 0; 


; 


Listing 1.2. Interface for specifying flow functions in IFDS/IDE 


The flow function factories shown in Listing 1.2 handle the different types of 
flows. The four factory functions each have an individual purpose: 
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— getNormalFlowFunction handles all intra-procedural flows. 

— getCallFlowFunction handles inter-procedural flows at a call-site. Usually, 
the task of this flow function factory is to map the data-flow facts that hold 
at a given call-site into the callee method’s scope. 

— getRetFlowFunction handles inter-procedural flows at an exit statement 
(e.g. a return statement). This maps the callee’s return value, as well as data- 
flow facts that may leave the function by reference or pointer parameters, 
back into the caller’s context /scope. 

— getCallToRetFlowFunction propagates all data-flow facts that are not 
involved in a call along-side the call-site, typically stack-local data not refer- 
enced by parameters. 


These flow function factories are automatically queried by the solver, based 
on the inter-procedural control-flow graph. 

The functions in Listing 1.2 are factories since they have to return small 
function objects of type FlowFunction which is shown in Listing1.3. As a 
FlowFunction is itself an interface, an analysis developer has to provide a suit- 
able implementation. The member function computeTargets() takes a value of 
a dataflow fact of type D and computes a set of new dataflow facts of the same 
type. It specifies how the bipartite graph for the statement that represents the 
flow function is constructed and can be thought of an answer to the question 
“What edges must be drawn?”. 

template <typename D> struct FlowFunction { 

virtual ~FlowFunction() = default; 


virtual set<D> computeTargets(D source) = 0; 


Ii 
Listing 1.3. Interface for a flow function in IFDS/IDE 


As flow function implementations often follow certain patterns, we provide 
implementations for the most common patterns as template classes. Many use- 
ful flow functions like Gen, GenIf, Kill, KillA11, and Identity are already 
implemented and can be directly used. Any number of flow functions can be 
easily combined using our implementations of the Compose and Union flow func- 
tions. We also provide MapFactsToCallee and MapFactsToCaller flow functions 
that automatically map parameters into a callee and back to a caller, since this 
behavior is frequently desired. Flow functions which are stateless, e.g. Identity 
or Kil1A11, are implemented as a singleton. 


5.2 Encoding an IDE Analysis 


If an analysis developer wishes to encode their problem within IDE, they have 
to additionally provide implementations for the edge functions. With help of the 
edge functions, an analysis developer is able to specify a computation which is 
performed along the edges of the exploded super-graph leading to the queried 
node (c.f. Fig. 1). The interface for the edge function factories and their respon- 
sibilities are analogous to the flow function factories in Listing 1.2. 
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Each edge function factory must return an edge function implemen- 
tation: a small function object similar to a flow function which has a 
computeTarget() function, a compose, a merge, and an equality-check oper- 
ation. The EdgeFunction interface is shown in Listing 1.4. 


template <typename V> class EdgeFunction { 


public: 
virtual ~EdgeFunction() = default; 
virtual V computeTarget(V source) = 0; 


virtual EdgeFunction<V> x 

composeWith(EdgeFunction<V> *secondFunction) = 0; 
virtual EdgeFunction<V> x 

joinWith (EdgeFunction<V> *otherFunction) = 0; 

virtual bool equal_to(EdgeFunction<V> *other) const = 0; 


Listing 1.4. Interface for an edge function in IDE 


As this interface is more complex than the flow function interface, we explain 
the purpose of each function. The computeTarget() function describes a com- 
putation over the value domain V in terms of lambda calculus. 

The composeWith() function encodes how to compose two edge functions. 
In most scenarios, this function can be implemented as (f o g)(x) = f(g(x)). To 
avoid additional boilerplate code, we provide an EdgeFunctionComposer class 
that performs this job and can be used as a super class. 

joinWith() encodes how to join two edge functions at statements where two 
control-flow edges lead to the same successor statement. Depending if a may or 
a must-analysis is performed, implementations of this function typically check 
which edge function computes a value that is higher up in the lattice, i.e., a 
more approximate value, and returns the corresponding edge function. For our 
linear constant propagation from Sect. 3, this function would return one of the 
edge functions if both describe the same value computation, the bottom edge 
function if both of them encode the L value and the edge function encoding the 
top element otherwise. The intuition here is to always pick the element that is 
higher in the lattice as it represents more information. 

The equal_to() interface function has to be implemented to return true if 
both edge functions describe the same value computation, false otherwise. 

A complete implementation of the IDE linear constant propagation can be 
found along with PhASAR’s other examples at our website [23]. 


5.3 Encoding a Monotone Analysis 


If an analysis developer wishes to encode a problem that does not satisfy the 
distributivity property, they have to make use of the monotone-framework imple- 
mentation or its inter-procedural variant. The interface for specifying an inter- 
procedural monotone problem is shown in Listing 1.5. Similar to an IFDS/IDE 
problem, an analysis developer has to specify flow functions for intra- and inter- 
procedural flows. But in contrast to IFDS/IDE, these flow functions do not oper- 
ate on single, distributive data-flow facts, but on sets of data-flow facts instead. 
The solver calls the flow functions and provides the set of data-flow facts which 
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hold right before the current statement. The return value to be computed in the 
flow function is a set of data-flow facts that hold after the effects of the current 
statement. The join() function specifies how information is merged when two 
branches join at a common successor statement. This is typically implemented 
as set-union or set-intersection depending on whether a may or must-analysis 
has to be solved. Algorithms from C+-+’s STL may be used here. Finally, the 
sqSubSetEqual() function must be implemented to determine if the amount 
of information between two sets has increased in order to check if a fixpoint 
is reached. The context that is used for the inter-procedural analysis can be 
specified by the analysis developer using the template parameter. An analysis 
developer can provide a pre-defined context class in order to parameterize the 
analysis to be a call-strings approach, a value-based approach, or they can define 
their own context to be used. 


template <typename N, typename D, typename M, typename I> 
struct InterMonotoneProblem { 
InterMonotoneProblem(I Icfg) : ICFG(Icfg) {} 
virtual ~InterMonotoneProblem() = default; 
virtual set<D> join(const set<D> &Lhs, const set<D> &Rhs) = 0; 
virtual bool sqSubSetEqual(const set<D> &Lhs, 
const set<D> &Rhs) = 0; 
virtual set<D> normalFlow(N Stmt, const set<D> &In) = 0; 
virtual set<D> callFlow(N CallSite , M Callee, const set<D> &In) = 0; 
virtual set<D> returnFlow(N CallSite , M Callee, N RetStmt, 
N RetSite, const set<D> &In) = 0; 
virtual set<D> callToRetFlow(N CallSite , N RetSite, 
const set<D> &In) = 0; 
}; 
Listing 1.5. Interface for describing an interprocedural problem for the monotone 
framework 


5.4 Handling of Intrinsic and Libc Function Calls 


LLVM currently has approximately 130 intrinsic functions. These functions are 
used to describe semantics in the analysis and optimization phase and do not 
have an actual implementation. Later-on in the compiler pipeline, the back-end is 
free to replace a call to an intrinsic function with a software or a hardware imple- 
mentation — if one exists for the target architecture. Introducing new intrinsic 
functions is preferred over introducing novel instructions to LLVM since, when 
introducing a new instruction, all optimizations, analyses, and tools built on top 
of LLVM have to be revisited to make them aware of the new instruction. A call 
to an intrinsic function can be handled as an ordinary function call. 

The functions contained in the libc standard library represent special tar- 
gets as well as these functions are used by virtually all practical C and C++! 
programs. Moreover, the functions contained in the standard library cannot be 
analyzed themselves as they are mostly very thin wrappers around system calls 
and are often not available for the analysis. In many cases, however, it is not nec- 
essary to analyze these functions when performing a data-flow analysis. PhASAR 


1 The compiler translates many of C+-++’s features into ordinary calls to libc. 
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models all of them as the identity function. An analysis developer can change the 
default behavior and model different effects by using special summary functions. 
The SpecialSummaries class can be used to register flow and edge functions 
other than identity. This class is aware of all intrinsic and libc functions. 


5.5 A Note on Soundness 


Livshits et al. have introduced the notion soundy analyses [18]. Soundy analyses 
use sensible underapproximations to cope with certain language features that 
would otherwise make an analysis impractically imprecise. Analyses in PhASAR 
are currently soundy. For instance, PhASAR’s ICFG misses one control-flow edge 
in the presence of set jmp()/longjmp(). Functions that are loaded dynamically 
from shared object libraries using dlsym() cannot be handled either. PhASAR’s 
data-flow solvers treat calls to dynamically loaded libraries and libraries for 
which function definitions are missing as identity, unless the analysis developer 
specifies otherwise. A sound handling would be to set all variables involved in 
such calls to T, which again, may lead to large imprecision. 


6 Scalability 


In this section, we present the runtime measurements for two concrete static 
analyses — IFDSSolverTest we name I and IFDSTaintAnalysis we name T 
— that are both implemented in PhASAR. I is a trivial IFDS analysis which 
passes the tautological data-flow fact A through the program. The analysis acts 
as a baseline as it is the most efficient IFDS/IDE analysis that can possibly 
be implemented. T implements a taint analysis. A taint analysis tracks values 
that have been tainted by one or more sources through the program and reports 
whenever one of the tainted values reaches a sink, which can be functions or 
instructions. Our taint analysis treats the command-line parameters argc and 
argv that are passed into the main() function as tainted. Functions that read 
values from the outside (e.g. fread()) are interpreted as sources. Functions that 
can leak tainted variables to the outside such as printf() or fwrite() are 
considered sinks. As a potentially large amount of tainted values have to be 
tracked through the program, analysis T will provide insights into the scalability 
of PhASAR’s IFDS/IDE solver implementation. 

Table 1 shows the programs that we analyzed. For each program, the IR’s 
lines of code, number of statements, pointers, and allocation sites have been 
measured with PhASAR. The LLVM IR has been compiled with the Clang 
compiler using production flags. The figures give an intuition for the program’s 
complexity. The programs that we analyzed comprise some C programs like 
some of the coreutils [3] as well as two C++ programs like PhASAR itself and 
a PhASAR-based tool MPT. In addition, it shows the runtimes of the analyses 
I and T separated into different phases (in the format runtime I/runtime T). 
We measured the runtimes for the construction of points-to information (PT), 
class hierarchy (CH), call-graph (CG), data-flow information (DF), and the total 
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runtime (X). We also measured the number of function summaries (f) that 
could be reused while solving the analysis. The latter one is a good indicator for 
the quality of the data-flow domain D, as higher reuse indicates a more efficient 
analysis. ##G and #K denote the number of facts that have been generated or 
killed in the taint analysis, respectively. 

We measured the runtimes by performing 15 runs for each analysis on a 
virtual machine running on an Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30 GHz 
machine with 128 GB memory. We removed the minimum and maximum values 
and computed the average of the remaining 13 values for each of the four analysis 
steps and the total runtime. We used an on-the-fly call-graph algorithm that 
uses points-to information for the coreutils. For PhASAR and MPT, we used a 
declared type-analysis (DTA) call-graph algorithm in order to reduce the amount 
of memory required to reproduce our results. In addition, we found that DTA 
performed well enough on our C++ target programs. 

With one exception, PhASAR is able to analyze a program from coreutils 
within a few seconds. Analyzing cp using T takes around 13 min. This is because 
a large amount of facts is generated which must then be propagated by the 
solver. This result shows the cubic impact of the number of data-flow facts 
on IFDS/IDE’s complexity. Analyzing the million-line programs PhASAR and 
MPT ranges from 7 to 18 min. As one can observe for PhASAR, an analysis may 
destroy data-flow facts more often than it generates them. This is caused by 
C+-+’s exceptional control-flow where the same fact is destroyed during normal 
and exceptional flow. 

We observed that the DF part of T actually runs faster than I for our C++ 
target programs. This is because T should behave very similar to the solvertest 
for the C++ target programs, as only very few facts are actually generated. 
Furthermore, T will take shortcuts whenever it plugs in the desired effects at 
call-sites of source and sink functions. I in contrast, follows these calls making 
it slower than T. 


Table 1. Program’s characteristics and performance figures for analyses I/T 


Program |kLOC] Stmts| Ptrs) Allocs|CH [ms]} PT [s]} CG [s]| DF [s] X [s] #w(f)| #G K 
we 32| 63166| 10644 396| 24/24/1.0/1.0/0.1/0.1} 0.2/11 2/13 119/125)10202| 6830 
ls 52| 71712| 13200 438| 27/27|1.4/1.4|1.1/1.2| 0.6/1.0 4/5 836/839 79 74 
cat 30| 62588) 10584 391 24/24|1.0/1.0|0.0/0.0) 0.1/1.3 2/3 21/22) 2525| 1262 
cp 41| 67097) 11722 443| 32/30/1.3/1.3/0.6/0.6] 0.4/789 3/792 547/737|16999|12839 
whoami 29) 61860} 10433 389| 24/23)/1.0/1.0|0.0/0.0) 0.1/0.3 2/2 8/11 97 92 
dd 37| 65287) 11150 408| 25/25\1.1/1.0|0.2/0.2| 0.2/37 2/40 164/176)14711|11058 
fold 30| 62201) 10509 390} 24/23|1.0/1.0|0.0/0.0| 0.1/0.3 2/2 17/22) 107) 102 
join 34| 64196| 11042 402| 24/24/1.0/1.0|0.0/0.0) 0.1/0.5 2/3 91/95} 104 94 
kill 30| 62304) 10527 394| 24/24/1.0/1.0/0.0/0.0} 0.1/0.1 2/2 24/24 22 4 
uniq 31) 62663] 10650 396| 24/24|1.0/1.0|0.0/0.0| 0.1/0.4 2/2 50/53 96 90 
MPT 3514|1351735|755567|176540|906/903| 22/22|8.8/8.8|458/379| 519/439|12531/12532 20 9 
PhASAR| 3554|1368297|763796 178486|962/946| 23/23| 24/24|987/917|1064/993|25778/25782 56 77 
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Analyzing all of the 97 coreutils, PhASAR, and MPT requires a total analysis 
time of of 30 min for I and 1h and 31 min for T. These measurements show that 
PhASAR is capable of analyzing even a million-line program within minutes, 
even though PhASAR’s algorithms and data structures have not yet undergone 
manual optimization. 


7 Guidelines for the Analysis on Real-World Code 


In this section, we share our experience in analyzing real-world C/C++ pro- 
grams. Although the LLVM IR is expressive enough to capture arbitrary source 
languages, we found that the characteristics and complexity of the source lan- 
guage propagate into the IR. Observe the following call-site in LLVM IR: 
%retval = call i32 %fptr(%class.S* dereferenceable(4) %ptr, i32 5), assuming C to be the 
source language, a plain function pointer is called. If C++ is the source language, 
we cannot be sure whether a function pointer or a virtual member function of 
class S is called. This is the reason why we observed that the analysis runtime 
for C++ target programs is usually much higher than for C programs. 

For more complex languages like C++ we have to keep track of special mem- 
ber functions. These functions are mapped into ordinary LLVM IR functions 
that Clang places in a well-defined order in the generated IR. For some analyses 
like the declared-type analysis (DTA) call-graph algorithm, we need to be aware 
of these special member functions in order to preserve high precision. 

We also found that even a well-debugged analysis that has been hardened on 
a large variety of test programs may still fail on production code as some corner 
cases have not been thought of. The large amount of information available to an 
analysis run makes debugging errors hard. A standard debugger does not suffice 
because an analysis writer has to step through a lot of code that is not relevant 
for them. For Java, a special dedicated debugger for static analysis has been 
developed [21] which shows the relevance of the problem. 

Depending on the optimization passes that have been applied to code in 
LLVM IR before it is handed over to the analysis, it may have very different 
characteristics. Although optimization passes are required to have no impact 
on the semantics, the structure of the IR code changes. In our experience, it is 
helpful to start developing an analysis on small test programs that are translated 
into IR without optimization passes, and cover as many cases as the analysis 
should find. Once an analysis handles these test cases correctly or with the 
desired precision, optimization passes should be applied to the test cases. After 
rerunning the analysis the results should be checked against their unoptimized 
version. When applying an analysis to production code, the code should be 
compiled using production flags in order to analyze code that is as close as 
possible to what actually runs on the machine. 

We found that the usage of debug symbols is helpful. The Clang compiler’s 
-g flag can be added to propagate the debug symbols into the IR. Those can 
then be queried using LLVM’s corresponding API. However, the debug symbols 
may not always present, which is why an analysis should not rely on them. 
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8 Future Work 


In this section we briefly summarize our plans for future improvements. 

It would be interesting to evaluate the use of PhASAR for analyzing a differ- 
ent IR. One type of IR might advantages over others for different analysis prob- 
lems. We plan to additionally support the GENERIC, GIMPLE and RTL [5,19] 
IR from the GCC project. 

Another interesting framework for data-flow analysis is Weighted Pushdown 
Systems (WPDS) [25,28]. WPDS is able to compute an analysis within a stack 
automaton. WPDS allows for more compact data structures, the generation of 
witnesses, as well as precise queries specifying paths of interest using regular 
expressions. We plan to support WPDS is a future version of PhASAR using 
the weighted/nested-word automaton library [32]. 

Checking the correctness of an IFDS/IDE analysis is complex since checking 
the correctness of the underlying ESG is tedious and time consuming. A high 
quality visualization may help reduce the amount of time spent debugging an 
analysis. A graphical user interface will reduce the amount of knowledge that is 
required to use the framework. 

Since the flow and edge functions have to be implemented in a general purpose 
programming language, they require some amount of boilerplate code. It remains 
an open question if one could design a non-Turing-complete EDSL with a library 
like boost: : proto [1] which simplifies the task of encoding analysis problems. 

PhASAR currently uses LLVM’s points-to information which is rather impre- 
cise. We plan to integrate a more precise pointer analysis into PhASAR to sup- 
port more precise call-graph construction and client analyses by adapting the 
demand-driven Boomerang approach presented in [29] to PhASAR. 


9 Conclusion 


In this paper, we presented our implementation of a static analysis framework for 
programs written in C/C++ named PhASAR. We presented its architecture and 
implementation from a user’s perspective to make practical static analysis more 
accessible. We presented experiments which have shown PhASAR’s scalability 
and discussed the runtimes of the key parts of two concrete client analyses. 

With PhASAR we strive toward the goals of providing a framework for static 
analysis targeting (but not limited to) C/C++, a base for quickly evaluating 
novel ideas and applications, and a suitable way of handling the complexity. 
PhASAR is open-source and available online [23] under the permissive MIT 
licence, and therefore, open for contributions, feedback and use. PhASAR has 
already received tremendous support in the research community and from prac- 
titioners as 223 stars and 26 forks on GitHub show.? 


? As of 8am February 07, 2019. 
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